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I.  INTRODUCTION 

In  the  design  and  analysis  of  statistical  signal  processing  procedures, 
it  is  usually  assumed  that  some  underlying  spectral  distribution  or  proba¬ 
bility  distribution  is  known  precisely.  Often  in  practice  this  is  an 
unrealistic  assumption.  Furthermore,  as  we  will  illustrate  in  Chapter  II 
of  this  thesis,  the  belief  that  nearly  accurate  models  will  result  in 
nearly  optimal  solutions  is  frequently  unfounded.  Thus,  we  would  like  to 
design  procedures  which  are  insensitive  to  small  deviations  from  an  assumed 
model.  Such  procedures  have  generally  been  termed  robust. 

In  1960,  Tukey  [39]  brought  attention  to  the  fact  that  a  number  of 
statistical  data-analysis  procedures  are  undesirably  sensitive  to  small 
deviations  from  the  assumed  probability  distribution  of  the  observations. 
During  the  1960's,  two  basic  approaches  to  the  problem  of  designing  robust 
alternatives  to  such  procedures  were  developed.  The  first,  which  could 
be  termed  the  "minimax"  or  "Huber"  approach  consists,  basically,  of  first, 
modeling  the  uncertainty  via  a  class  of  probability  distributions  and, 
then,  finding  a  procedure  which  has  the  best  worst-case  performance  over 
this  class  (see  [43],  [14],  [27],  [6]  and  [44]).  The  other  approach,  which 
was  originated  by  Hampel  [45],  views  robustness  in  terms  of  the  continuity 
properties  of  a  procedure  on  a  space  of  probability  distributions  (see 
[45]  and  [44  ]  ) . 

These  techniques  were  first  applied  in  a  statistical  signal  processing 
context  by  Martin  and  Schwartz  [40]  who  considered  the  design  of  robust 
signal  detection  procedures.  Other  results  in  robust  detection  were  sub¬ 
sequently  obtained  by  Kassam  and  Thomas  [46],  El-Sawy  and  VandeLinde  [5], 

[8],  Kuznetsov  [22],  Poor  [53]  and  many  others  (for  a  survey,  see  [47]). 
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Robust  parameter  estimation  has  also  been  considered  in  a  signal  processing 
context  by  Martin  [50]  ,  Papantoni-Kazakos  [51]  ,  Price  and  VandeLinde  [52] 
and  others  (see  [48]  for  a  survey).  Further,  results  have  been  developed 
for  robust  nonlinear  filtering  (see  Martin  [49]  for  a  survey  of  this  area). 
Recently,  Kassam  and  Lim  [1]  ,  Cimini  and  Kassam  [25]  and  Poor  [2],  [7]  have 
developed  a  method  of  designing  Wiener  filters  which  are  robust  with 
respect  to  spectral  uncertainty. 

This  thesis  considers  several  top:  in  the  general  area  of  robust 

signal  processing.  In  particular,  in  C  ter  II  of  this  thesis,  we  present 

results  from  a  numerical  study  of  the  i  >  mance  of  the  robust  Wiener 
filters  designed  via  the  methods  of  [1],  [2],  [25]  .  We  begin  Chapter  II 
by  examining  the  effects  of  spectral  uncertainty  on  traditional  Wiener 
filters.  We  show  that  in  many  cases  a  clear  need  for  robust  Wiener  filtering 
exists  and  that  in  many  of  these  cases  the  robust  Wiener  filters  developed 
in  [1],  [2],  [25]  are  an  effective  alternative  to  traditional  Wiener  filters. 

In  Chapter  III,  using  a  general  formulation  analogous  to  that  developed 
in  [2]  for  robust  linear  continuous-time  (Wiener)  noncausal  filtering,  we 
develop  a  method  of  designing  robust  linear  discrete-time  (Wiener-Kolmogorov) 
causal  signal  estimators  (e.g.,  robust  n-step  predictors,  robust  causal 
filters  and  robust  n-lag  smoothers).  The  specific  problem  of  robust  one- 
step  noiseless  prediction  is  developed  in  detail  and  numerical  results  are 
given  for  the  particular  problem  of  robust  filtering  in  white  noise. 

In  Chapter  IV,  we  present  a  generalization  of  the  results  of  Huber 
and  Strassen  [6].  In  [6],  a  cohesive  theory  of  (minimax)  robust  hypothesis 
testing  was  developed  for  the  quite  general  situation  in  which  uncertainty 
is  modeled  via  classes  of  probability  distributions  dominated  by  2-altemating 
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Choquet  capacities.  These  resalts  have  been  applied  by  Poor  [7],  [32]  to 
problems  in  communication  theory  in  which  spectral  uncertainty  is  modeled 
via  capacity  classes  of  spectral  distributions.  In  this  chapter  we  extend 
the  theory  of  [6]  to  certain  situations  which  are  appropriate  in  this  new 
context.  For  example,  we  show  in  Chapter  IV  how  the  problem  of  robust 
continuous-parameter  smoothing  of  an  uncertain  signal  in  white  noise  now 
fits  within  the  general  framework  developed  in  [7].  Furthermore,  the 
results  given  in  Chapter  IV  partially  extend  the  usefulness  of  the  results 
of  [6]  to  noncompact  measure  spaces.  Finally,  the  band  model,  an  uncer¬ 
tainty  class  which  is  appropriate  for  many  applications,  is  shown  to  be  a 
2-altemating  capacity  class  and  is  used  to  illustrate  certain  results  of 
this  chapter. 

In  Chapter  V,  a  commonly  used  model  of  uncertainty  known  as  the  p-point 
class  is  examined.  It  is  shown  that,  while  the  p-point  class  is  not  a 
capacity  class,  it  is  contained  in  a  capacity  class  which  we  call  an 
extended  p-point  class.  In  many  instances  the  results  of  [6],  [7],  [32] 
which,  of  course,  hold  for  this  extended  p-point  class  are  shown  to  hold 
also  for  the  corresponding  p-point  class. 


II. 


AN  ANALYSIS  OF  THE  EFFECTS  OF  SPECTRAL  UNCERTAINTY  ON  WIENER  FILTERING 


1.  Introduction 

The  solution  to  the  traditional  stationary  linear  (i.e.,  Wiener) 
filtering  problem  requires  exact  knowledge  of  the  signal  and  noise  spectra. 
Often  in  practice  it  is  unrealistic  to  assume  such  knowledge.  Despite  this, 
Wiener  filters  are  widely  used  for  steady-state  filtering.  In  this  chapter 
we  consider  the  performance  of  Wiener  filtering  when  the  signal  and  noise 
spectra  differ  to  a  small  degree  from  those  assumed  in  the  design  process. 

In  Chapter  III,  we  will  consider  the  related  problem  of  discrete-time 
(Wiener-Kolmogorov)  signal  estimation  when  spectral  uncertainty  exists. 

In  Section  2  of  this  chapter  we  consider  the  Wiener  filter  for  a 
particular  signal  and  noise  spectral  pair  which  would  be  natural  to  assume 
is  the  true  spectral  pair.  We  then  look  again  at  our  circumstances  and 
model  the  uncertainty  we  might  have  about  our  choice  of  spectra.  In  so 
doing  we  find  that  the  potential  exists  for  totally  unacceptable  perfor¬ 
mance  degradation  in  the  presence  of  even  small  degrees  of  uncertainty. 

In  Section  3  we  consider  filters  termed  "robust".  These  filters  are 
designed  to  have  the  best  "worst-case"  performance  over  uncertainty  classes 
of  spectra.  The  method  of  design  is  due  to  Poor  [2]  and  was  based  on  the 
work  of  Kassam  and  Lim  [1].  As  we  will  see,  the  advantage  of  these  robust 
filters  is  that  they  are  least  sensitive  in  the  sense  that  they  have  the 
smallest  possible  maximum  deviation  from  optimality  within  the  constraints 
imposed  by  our  uncertainty. 

Of  course  there  is  a  trade-off  involved  in  robust  filtering.  While 
the  robust  filter  has  better  worst-case  performance,  we  cannot  expect  it 
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to  have  optimal  performance  should  our  original  choice  of  spectra  be  the 
true  ones  In  Section  3  we  will  consider  this  trade-off  as  well. 

2.  The  Sensitivity  of  the  Wiener  Filter  to  Spectral  Uncertainty 

The  mean-square-error  (MSE)  for  linear  filtering  of  a  signal  in 
uncorrelated  additive  noise,  where  both  signal  and  noise  are  modeled  as 
zero-mean,  second-order,  wide-sense  stationary  random  processes,  is 
given  by 


2  2 

e(o,v;H)  =  J[a (m)  1 1-H(«)  |  +v(w)|h(w)|  ]  do;  , 


(2.1) 


where  H  is  the  transfer  function  of  the  filter  and  a  and  v  are  the  power 
spectral  densities  (PSU's)  of  the  signal  and  noise,  respectively.  For  a 
fixed  signal  and  noise  spectral  pair,  (o,v)»  e(cr,v;H)  is  minimized  by 
the  Wiener  filter 


H  o(u)  +  v(u) 


(2.2) 


and  the  minimum  MSE  is 


*  A  * 

e  (a , v)  =  e(a ,v;H  ) 


=  J  H  (oj)  v(u >)  daj 


(2.3) 


Unfortunately,  as  we  discussed  in  Section  1,  it  is  often  the  case 

in  practice  that  our  knowledge  of  the  signal  and/or  noise  PSD's  is  inexact. 

* 

If  the  a  and  v  we  choose  for  designing  H  are  not  the  true  spectra,  then 
our  filter  will  generally  have  less  than  optimal  performance.  To  illustrate 
the  degree  of  performance  degradation  that  can  result  from  such  mis-modeling, 
we  consider  the  following  examples.  The  numerical  results  presented  here 
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and  in  the  following  section  comprise  a  representative  selection  from  an 
extensive  numerical  study. 


The  p-point  class.  For  a  number  of  applications  it  is  natural  to 
assume  that  we  have  a  narrow-band  first-order  Markov  signal  in  wide-band 
first-order  Markov  noise,  i.e.  that 


and 


co(ui) 


2ac  v 
*S  +w 


2 

S 
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(2.4) 


V0(uj) 


2\  V 

2  . 

aN+a> 


2 

N 
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2  2 

where  a_  «  a  T  are  the  3  dB  bandwidths  and  v_  and  v„  are  the  powers  of  the 

S  N  S  N 

signal  and  noise,  respectively.  For  Fig.  1  we  have  ct^  *  10  and  *  1. 

In  the  figures  of  this  chapter  we  have  used  a  measure  of  performance 

which  we  refer  to  simply  as  output  signal-to-noise  ratio  (SNR) .  The  purpose 

of  Wiener  filtering  is  to  minimize  the  MSE,E[[S(t)  -  S(t)]  }, 

* 

between  our  estimate  S(t)  (i.e.  the  output  of  the  filter)  and 

the  actual  signal  S(t) .  Since  the  output  of  the  filter  can  be  written  as 


S(t)  +  (S(t)  -  S(t)),  we  use  the  signal  power  divided  by  the  MSE  as  an  output 

SNR.  For  the  purpose  of  our  graphs  we  translate  this  to  dB.  The  horizontal 
2  2 

axis  is  10  log^gCVg/v^) ,  the  input  SNR  in  dB. 

* 

The  top  line  in  Fig.  1  gives  the  performance  of  the  Wiener  filter  H^, 
designed  using  and  vQ  of  (2.4)  i.i  equation  (2.2),  when  aQ  and  uQ  are,  in 
fact,  the  signal  and  noise  spectra  which  occur.  For  this  case  it  is 
straightforward,  via  equation  (2.3),  to  show  that 
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Input  SNR  (d  B) 


FIG.  1.  p-point  example.  (From  top  to  bottom)  at  (Oq.v^); 
"trivial  filtering";  at  its 


worst  case. 


Now,  suppose  that  the  only  Information  about  which  we  are  certain 

2  2 

is  the  powers  of  the  signal  vc  and  the  noise  v  and  that  we  have  estimated 

S  N 

with  sufficient  accuracy  the  fractional  power  of  each  on  the  set 

S  =  {u>  real|  |oj|  <,  1} .  We  denote  the  signal  and  noise  fractional  powers  by 

-1  2 

Ps  and  p^,  respectively  (e.g.  (2ir)  a(w)dw  ■  PgVg)  .  In  particular,  for 

the  example  considered  above,  we  have  pg  *  0.5  and  pN  *  0.063.  If  these 

total  powers  and  fractional  powers  are  all  we  can  really  be  certain  of,  we 

* 

would  like  to  know  how  badly  the  performance  of  can  deteriorate.  The 

* 

bottom  line  In  Fig.  1  gives  the  worst-case  performance  of  H^.  The  middle 
line  represents  what  we  can  do  trivially  for  any  pair  of  spectra  by  using  an 
all-pass  filter  (H  =  1)  when  the  input  SNR  is  positive  and  by  using  a  no¬ 
pass  filter  (H  =  0)  when  the  input  SNR  is  negative.  Thus  we  see  that  if 
the  spectra  are  actually  first-order  Markov  then  our  filter  does  well,  but 
if  not  we  can  do  significantly  worse  than  trivial  filtering. 

Finally  we  note  that  uncertainty  classes  of  spectra  given  by  assuming 
exact  knowledge  only  of  the  total  and  fractional  powers  are  called  p-point 
classes  and  have  been  studied  as  models  of  spectral  uncertainty  by  Cimini 
and  Kassam  [25].  An  analogous  uncertainty  class  for  probabilities  used  in 
robust  hypothesis  testing  and  robust  detection  has  been  examined  by  El-Sawy 
and  VandeLinde  [5],  {8].  These  classes  will  be  considered  in  greater  detail 
in  Chapter  V. 

The  e-contamination  class.  Suppose  that  we  again  have  a  particular 


spectral  pair  (cn,vn)  which  we  believe  to  be  the  true  spectra,  but  that  we 
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also  have  a  general  sense  of  uncertainty  about  our  choice  which  we  model  by 

an  e-contaminated  class;  i.e.,  we  assume  we  know  that  the  true  spectra 

satisfy  (a,v)  s  J  x  71  where  0  <.  e  <.  1, 
e  e 


and 


00  OO 

=  {o|  a  (as)  =  (l-e)a0(u)  +  ea,(aj)  w«  1R,  /  a*( u)  dw  -  /  oQ(“)  da>) 

-OO  *Loo 

00  00 

*  Cv|v(w)  ■  (I-e)Vq(o))  +  ev  '  (cu)  aseTR,  J ^  v'(a))  da)  =  J"  Vq(oo)  da)} 


(2. 


Classes  of  this  form  have  been  used  extensively  as  general  models  of  uncer¬ 
tainty  [39],  [14],  [27],  [40],  [1],  [23]. 

Fig.  2  gives  the  performance  of  the  Wiener  filter  designed  via 
equation  (2.2)  assuming  a  narrow-band  (ctg  *  1)  first-order  Markov  signal  in 
wide-band  (a^  *  1000)  first-order  Markov  noise.  The  upper  line  gives  the 
performance  of  this  filter  when  these  are  the  true  signal  and  noise.  The 
lower  line  is  the  worst  case  of  this  filter  over  the  uncertainty  classes 
in  (2.5)  with  and  given  by  the  above  choices  and  with  £  =  0.1.  We  see 
that,  for  values  of  input  SNR  near  zero,  the  worst  case  is  better  than  trivial 
filtering  but  still  much  worse  than  optimal  (about  8.5  dB) ;  for  values  of 
input  SNR  greater  in  absolute  value  than  60  the  performance  in  both  the 
nominal  and  worst  cases  is  the  same  as  trivial  filtering;  and  for  all  other 
values  the  worst  case  is  worse  than  trivial  filtering. 

An  e-contaminated  signal  in  white  noise.  Fig.  3  shows  the  nominal 
and  worst  case  performance  of  the  nominal  Wiener  filter  for  the  signal 
uncertainty  class  J  in  (2.5)  with  e  =0.1  and  cQ  first-order  Markov  with 
Xg  =  1.  The  noise  is  white  noise  with  no  uncertainty  and  the  horizontal 


t 


n 
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Input  SNR(dB) 


fig.  3. 


e-ccntaminated  signal  in  white  noise.  H  at  (cn>vn) 
at  its  worst  case.  u 


FP-7332 


axis  is  actually  the  ratio  of  signal  power  v*  to  the  noise  level  N /2.  Note 

b  0 

that  the  worst  case  is  bounded  above  by  10;  in  fact,  for  any  choice  of  e,  it 
is  bounded  above  by  -10  log(s). 

As  noted  above,  the  optimal  and  worst-case  performance  of  Wiener 
filtering  under  various  conditions  has  been  examined  extensively  for  several 
uncertainty  models  and  for  a  variety  of  signal  and  noise  parameters  (such  as 
bandwidth  and  power) .  The  above  examples  are  reprer  atative  of  the  sensi¬ 
tivity  of  Wiener  filtering  to  deviations  from  spectral  assumptions  which 
were  found  in  virtually  every  case.  Further  examples  are  pictured  in  the 
appendix. 


3.  Robust  Wiener  Filters 

To  remedy  the  problems  of  Wiener  filtering  sensitivity  discussed  in 
the  preceding  section,  we  consider  the  following  robust  filter  design  which 
was  developed  by  Poor  [2]  based  or.  the  work  of  Kassam  and  Lim  [1], 

A  most-robust  Wiener  filter  [2]  is  a  solution  H„  to  the  game 

' .  ' .  r  - "  K 

min  sup  e(a,v;H)  (2.6) 

H  (c,v)c^x7I 


where  J  and  71  are  classes  of  spectra  representing  uncertainty  in  the  signal 

and  noise, respectively,  and  where  e(c,v;H)  is  given  in  (2.1).  Note  that 

since  the  supremum  in  (2.6)  gives  the  least  upper  bound  on  the  error, 

* 

H  is  a  filter  with  the  smallest  possible  such  upper  bound.  In  other  words 
R 

* 

H  is  least  sensitive  to  worst  case  uncertainty. 

R 

A  pair  of  spectra  (o  ,v  )  is  least  favorable  for  Wiener  filtering  for 

Li  Li 

the  spectral  uncertainty  classes  J  and  7j  [2}  if 

*  * 


e ( r ,  ;;Hl) 


(2.7) 


i 
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for  all  a  *  «/,  v  c  7[  where  R.  is  the  Wiener  filter  for  the  pair  (oT,vT) 
as  in  (2.2). 

It  is  straightforward  to  see  that  if  (o  ,v  )  c  J  X  71  is  least  favorable 

L  L> 

for  Wiener  filtering  for**  and  71  then  the  pair  ,H^)  is  a  saddle-point 

solution  to  the  minimax  game  (2.6).  That  is. 


*  * 

sup  e(a,v;H  )  =  e(o  ,vT  ;H. )  *  min  e(oT,vT;H), 
(<r,v)  e  Jx  71  L  L  L  “L  H  L  L 


(2.8) 


We  see  from  this  that  if  is  least  favorable  then  is  a  most-robust 

Wiener  filter. 

Thus  we  see  that  if  we  can  find  a  least-favorable  pair  then  we  can 

design  a  most-robust  Wiener  filter.  One  of  the  methods  developed  by 

Poor  [2]  for  finding  least  favorable  pairs  of  spectra  (and  hence  most- 

robust  filters)  involves  an  analogous  concept  in  hypothesis  testing:  least- 

favorable  probability  density  functions  (PDF's)  for  testing  one  set  of  PDF's 

against  another.  Least-favorable  PDF's  have  been  found  for  a  variety  of 

classes  of  PDF's  (see  [14],  [27],  [20],  [21],  and  Chapter  V).  If 

2 

every  signal  spectrum  in  J  has  the  same  finite  power  vg  and  every  noise 

2 

spectrum  in  7[  has  the  same  finite  power  v^  then  we  can  define  classes  of 
PDF's 


9^  *  {fSlfS(u)  *  cj(u>) /2irVg  ,  OE a^} 


=  {fN|  fN(w)  =  v(u>)  ^ttVjj,  veTJ} 
and  possibly  apply  the  following  (  [2),  Corollary  1). 

2 

Theorem  2.1:  If  J  and  71  are  convex  and  have  constant  powers  v  and  v  ,  re- 

spectively,  and  q_  e  9  and  q  «  9  are  least-favorable  PDF's  for  9  versus  9 
S  S  N  N  a  ^ 


1 

1 
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then  aL  -  2TTVSqS  and  VL  *  2ttvN%  are  least  favorable  spectra 
for  Wiener  filtering  for  J  and  71. 

This  theorem  allows  us  to  construct  most-robust  Wiener  filters  for  the 
first  two  examples  considered  in  Section  2. 

The  p-point  class.  It  can  be  seen  from  [25  j  that 

(  2 


<<“> 


- j-2 - j  for  •  <  S 

Vs  +  Vs 

(1-VvS  ,  _c 

- = - j  for  u  e  S 

,  a-ps)vs  +  a-pN)vs 


and,  hence. 


e 


PSPN  (I-PS> (1“PN) 

V  +  PN  d-Ps)r  +  (1"PN) 


for  all  (o,v)  6*fx7?, 


A  2,2 

where  r  =  ve/v„,  the  input  SNR.  In  Fig.  4  we  have  superimposed  onto  Fig.  1 
o  N 

the  performance  of  H  (the  middle  line).  It  is  clear  from  Fig.  4  that, 

R 

* 

unless  we  are  extremely  certain  about  our  choice  of  a  and  v,  H  is  prefer- 

K 

* 

able  to  H  . 

0 

The  e-contaminated  class.  For  the  classes  in  (2.5)  it  can  be  easily 
seen  from  the  above  theorem  and  [  14  ]  that 

( 

k'  -  c'r/(c’r  +  1)  for  H*(w)  <  k’ 

H*(u)  *  (  H*(w)  for  k’  <  H*(cd)  <  k" 

k"  -  c"r/ (c"r  +  1)  for  H*(w)  >  k"  , 

i. 

where  0  s.  c'  <  c"  <.  ®  are  constants  given  by  Huber  [14  ].  It  is  interesting 

* 

to  note  that  the  robust  filter  H  has  this  same  form  for  several  other 

R 


-20 


FIG.  A.  p-point 
* 

H  at  a 
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uncertainty  models  (see  [2],  [27]).  Also  note  that  this  H_,  will 

R 

not  have  constant  MSE  over  J  *  7[  as  in  the  previous  example.  In  Fig.  5 

we  have  superimposed  onto  Fig.  2  the  performance  of  H  when  the  true  spectral 

R 

pair  is  (oq.Vq)  (the  second  line  from  the  top)  and  when  the  true  spectral 

pair  is  (ol»vl)  (the  third  line  from  the  top).  Recall  from  the  definition 

* 

of  (o^»vL)  that  the  latter  is  the  worst-case  performance  of  1^.  For  this 
example  c'  =  1/c  "  -  0.125. 

Unlike  the  preceding  example,  the  preferability  of  the  most-robust  filter  is 
not  so  clear-cut.  If  one  were  relatively  certain  about  (cTq.Vq)  being  correct  then 
HQ  would  be  the  better  choice;  however,  if  not,  and  if  the  guaranteed  level  of  per formancj 
over  JX  71  (given  by  the  third  line  down)  were  adequate,  we  would  likely  choose  H^. 

Aft  e -contaminated  signal  in  white  noise.  Clearly  the  above  theorem 
cannot  be  applied  to  find  a  robust  filter  in  this  case  since  the  noise  has 
infinite  power;  however  a  more  direct  approach  proves  fruitful  here.  First, 
we  may  restrict  our  search  to  H  e  I^Cdw),  the  mean-square  integrable  functions 
on  R,  the  real  line,  since  all  others  have  infinite  MSE  regardless  of  what 
a  is  (cf.  equation  (2.1)).  Second,  we  have,  for  all  H  e  L^Cdui), 


,  2  2  ^ 
sup  e(a,vQ;H)  =  sup  —  J  [  j  l-H(oo)  |  ((l-e)a0(u))  +  ea'(uj))  +  j  H  (uo )  |  du> 


aeV 


F 


2 

=  e((l-e)a0  ,vQ  ;H)  +  e  sup  J  |  I-H(cj)  |  a '(a>)  d^ 


(2.9) 


m 


e((l-e)a0  ,vQ  ;H)  +  evg. 


The  last  step  is  true  because  H  e  L„  (dui)  and  it  is  assumed  that  J  a'  (w)dto  =  2iv~ . 

‘  7  Q 

F 
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! 


1 


4 


FIG.  5.  £-contaminated  example.  (From  top  to  bottom)  H  at  (o  ,vn) ; 
Hr  at  (a0,vQ);  Hr  at  (^L,vL)  (HR's  worst  case);  HQ  at  its 
worst  case. 
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Clearly  equation  (2.2)  and  equation  (2.6)  (the  definition  of  H  )  imply  that 

R 


(l-e)a0(o») 
<l-e)o0(a>)  +  NQ/2 


(2.10) 


minimizes  (over  H)  the  last  expression  in  (2.9).  But,  for  the  value  of  H 
given  in  (2.10),  we  have  equality  in  (2.9).  Thus,  HR  for  this  problem  is 
given  by  (2.10) . 

* 

Recall  that  Fig.  3  showed  the  performance  of  Hq  in  this  situation 

in  its  nominal  and  worst  cases.  If  we  superimposed  the  nominal  and  worst 
* 

cases  of  HR  onto  Fig.  3,  as  we  have  done  for  the  other  examples,  we  would 

find  no  change;  i.e. ,  up  to  the  accuracy  of  the  graph  the  nominal  cases 

*  * 

and  worst  cases  of  and  are  the  same.  In  fact,  they  differ  by  no 
more  than  0.01.  It  should  be  noted  that  this  is  a  singular  example  and 
the  unusual  performance  is  due  to  the  infinite  power  of  the  white  noise, 
not  to  the  "very  wide  bandedness"  which  white  noise  is  generally  used  to 
model . 


4 .  Discussion  and  Conclusions 

As  we  have  discussed  above,  the  results  presented  in  this  chapter 
(with  the  one  exception  of  the  white  noise  example)  are  representative  of 
our  findings  in  a  wide  variety  of  cases.  For  example,  although  it  is  a 
much  harder  case  to  solve,  we  have  developed  numerical  results  for  causal 
Wiener  filtering  of  an  e-contaminated  first-order  Markov  signal  in  first- 
order  Markov  noise.  The  theory  of  the  causal  case  has  not  been  developed 
in  the  same  generality  as  the  noncausal  case;  however,  this  specific 
example  can  be  treated  using  the  results  of  Poor  (2|  and  Yao  | 16 ] .  In 
Fig.  6  we  have  presented  the  results  for  this  causal  filtering  example 
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FIG.  6.  Causal  example.  (From  top  to  bottom)  H  at  (o0,v  );  H  at 

*  *  (J  ^  U  U  K 

(o  ,v  );  H  at  (a  ,vT )  (H  's  worst  case);  H  at  its  worst  case. 
U  0  R  L  L  R.  0 
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with  e  =  0 . 1,  a,,  =  1  and  a^  =  1000 .  For  comparison  we  have  also  included 
Fig.  7  which  gives  the  results  for  the  corresponding  noncausal  case.  Note 
the  similarity  between  the  two  figures.  Again,  this  is  indicative  of  our 
findings  over  a  wide  range  of  signal  and  noise  bandwidths  and  e's  (see  the 
appendix) . 

Other  situations  we  examined  in  the  noncausal  case  include  ones  with 

3  2  2  2  2 

a  and/or  v  as  second-order  Markov  (i.e.  having  the  form  4a  v  /(a  +  oj  )  ) 
or  using  bandlimited  white  noise.  The  results  for  all  these  cases  were 
similar  to  those  already  presented  (e.g.  Fig.  5).  (Again,  see  the  appendix.) 
Of  particular  interest  is  the  case  of  an  e-contaminated  first-order 
Markov  signal  in  e-contaminated  bandlimited  white  noise.  Even  when  the 
bandwidth  of  the  noise  was  extremely  large  (e.g.  10^)  the  results  were  similar 
to  the  other  cases  and  unlike  those  involving  nonbandlimited  white  noise 
(cf.  the  remarks  at  the  end  of  Section  3  and  Figure  22  in  the  appendix). 

In  summary,  the  Wiener  filter  can  be  undesirably  sensitive  to  small 
deviations  from  assumed  spectral  models.  Furthermore,  while  there  are 
enough  specific  cases  to  the  contrary  to  make  caution  advisable,  we  have 
found  for  a  wide  variety  of  situations  that,  when  spectral  uncertainty 
exists,  the  robust  Wiener  filter  is  generally  preferable  to  the  traditional 


Wiener  filter. 
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FIG.  7.  Noncausal  version  of  FIG.  6  (all  parameters  the  same). 
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III.  ROBUST  WIENER-KOLMOGOROV  THEORY 


1.  Introduction 

We  saw  in  Chapter  II  that  optimal  linear  continuous -time  (Wiener) 
noncausal  filters  can  be  undesirably  sensitive  to  spectral  uncertainty 
and  that  often  the  robust  Wiener  filters  developed  in  [2]  are  preferable 
because  of  their  insensitivity  to  such  uncertainty.  In  this  chapter  we 
consider  a  general  formulation  of  robust  linear  discrete-time  causal 
estimation  of  a  linear  function  of  a  wide-sense  stationary  signal.  This 
formulation  is  analogous  to  the  robust  Wiener  noncausal  filtering 
development  in  [2]  which  is  summarized  in  Chapter  II,  Section  3. 

Recall  that  the  essential  steps  of  this  formulation  consist  of,  first, 
choosing  two  classes  of  spectra  which  model  the  signal  and  noise  uncer¬ 
tainty  and,  second,  finding  the  signal  estimator  which  minimizes  the 
maximum  error  over  these  spectral  uncertainty  classes.  Our  main  results 
yield,  under  mild  conditions  on  these  spectral  uncertainty  classes,  a 
method  of  designing  robust  n-step  predictors,  robust  causal  filters  and 
robust  finite-lag  smoothers  and  guarantee  their  existence.  In  order  to 
illustrate  this  method  of  design,  the  special  case  of  robust  one-step 
noiseless  prediction  is  developed  in  detail.  Also,  numerical  examples 
are  given  for  robust  causal  filtering  of  an  uncertain  signal  in  white 
noise. 

In  Section  2  we  briefly  present  the  traditional  discrete-time 


(Wiener-Kolmogorov)  signal  estimation  problem  [28]  and  discuss  those 
aspects  which  will  be  relevant  in  the  sequel.  In  Section  3,  we  present 
the  robust  version  of  this  problem  and  state  and  prove  the  main  theorems 
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Section  3  includes  a  discussion  of  commonly  used  models  of  uncertainty. 

In  Section  4,  we  apply  the  results  of  Section  3  to  the  problem  of  robust 
one-step  noiseless  prediction.  For  this  case,  explicit  expressions  are 
developed  using  an  analogy  with  robust  hypothesis  testing  originally 
developed  in  [2],  In  Section  5,  examples  are  given  and  discussed.  In 
Section  6  we  consider  the  development  of  Section  3  in  the  more  general 
situation  when  the  signal  and  noise  are  represented  by  spectral  distribu¬ 
tions  rather  than  spectral  densities  as  in  Section  3.  Results  of  a 
somewhat  different  nature  are  obtained.  Finally  Section  7  contains 
conclusions  and  general  discussion. 

2.  Background  and  Preliminaries 


Throughout  this  chapter  we  assume  that  we  observe  a  portion  of  a  reali¬ 
zation  {y  (k)  |  k  5  Z,  kS  kQ}  of  a  random  process  {Y(k)|k  6  Z} ,  where  Z 
denotes  the  set  of  integers,  and  we  assume  that  Y(k)  =  S(k)  +  N(k), 

Y  k  €  Z,  where  [S(k)|  k  6  Z}  and  {N(k)|k  6  Z}  are  second-order,  wide- 
sense  stationary  random  processes  which  are  uncorrelated  with  each  other. 

We  can  also  assume  that  {N(k)|k  6  Z}  is  zero-mean.  The  processes  {S(k)} 
and  (N(k)}  represent  signal  and  noise,  respectively. 

Our  purpose  is  to  form  a  linear  causal  estimate  of  a  linear  function 
of  (s(k)}  from  the  observation  {y(k)}.  That  is,  we  are  given  some  function 
of  the  signal  having  the  form 


Vk) 


00 


n=-oo 


d(k-n) s(n) 


(3.1) 
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and  we  wish  to  find  the  "best”  estimate  of  {^(k)}  among  all  estimates  having 
the  form 


^(k)  ■  Z  h(k-n)y(n) 


(3.2) 


where  h  satisfies  h(n)  *  0  for  n  <  0  (causality).  The  usual  criterion  of 
optimality  is  to  find  such  an  h  minimizing  the  mean-square  error  (MSE) . 
For  given  functions  d  and  h  the  well-known  formula  for  the  MSE  is 


E(Ud(k)-  £h(k)]2}  =  [  |D(6)-H(0)  jZfs(e)+|H(9)  |ZfN(e)]dX(0) 


eD^fS’fN’H^ 


(3.3) 


where  D  and  H  are  the  transfer  functions  (i.e.,  Fourier  transforms)  of 
the  transformations  d  and  h,  respectively,  and  fg  and  fN  are  the  power 
spectral  densities  (PSD's)  of  (S(k))  and  (N(k)},  the  signal  and  noise,  with 
respect  to  the  finite  measure  X.  In  general,  we  will  take  X  to  be  Lebesgue 
measure  on  [  — tt , tt ]  (so  that  fg  and  are  just  the  usual  PSD's)  but  it  also 
might  be  convenient,  for  example,  to  allow  X  to  include  some  point  masses 
in  order  to  represent  pure  sinusoids  in  a  mathematically  rigorous  fashion. 
(More  will  be  said  about  this  in  Section  6.) 

We  refer  to  the  transformation  d  (or  its  transfer  function  D)  as  the 


desired  operation.  The  cases  of  greatest  interest  occur  when  d(n)  =  1  for 

inoe 

n  =  Oq  and  d(n)  =  0  for  n  /  n^  (correspondingly  D(8)  *  e  )  .  If  n^  =  0 
our  problem  is  causal  filtering,  if  n^  <  0  it  is  prediction  n^  steps  ahead, 
and  if  n^  >  0  we  have  smoothing  with  fixed  lag  n^. 


« 
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If  the  PSD's  fg  and  are  known  and  the  desired  operation  D(0)  Is 
given  then  we  would  like  to  minimize  e^Cf^.f^H)  over  all  causal  transfer 
functions.  We  denote  by  H*  the  solution  to  this  minimization  problem  and 
refer  to  as  the  optimal  causal  transfer  function  for  (f_,f,,).  Note  that 
H+  is  unique  a.e.  (fg  +  f^)dX  (i.e.,  if  H'  also  minimizes  e^(fg,f^;H)  over 
all  causal  transfer  functions  then  P  ,  [f-(0)  +  f„(0)]dX(8)  *  0,  see  [3]). 

{h'*h+}  S  N 

For  the  remainder  of  this  chapter  we  will  assume  that  the  PSD's  fg 

and  f  are  bounded  a.e.  [X].  Further,  we  will  only  consider  filters  H 

which  are  mean-square  integrable  on  [-7r,ir]  with  respect  to  X,  i.e.  we  will 
2  A  2 

assume  that  H  €  L  =  L  (dX(8)).  Of  course,  we  only  want  to  consider  those 
2 

H  €  L  which  are  causal  transfer  functions.  The  set  of  such  H  is  denoted 
by  =  H^(dX(9))  and  can  be  defined  as  the  (closed)  subspace  of  the  Hilbert 
space  which  is  spanned  by  {e*n^|n  =  0,1,2,...}  .  is  called  a  Hardy 
space  (see  [3], [4],  or  [18]).  With  these  definitions  the  minimum-MSE 
problem  for  a  specific  pair  of  PSD's  (fg,f^)  and  desired  operation  D(9)  can 
be  formulated  as 


eD(£S’fN)  *  "in„2  VW0 


(3.4) 


3.  Robust  Linear  Estimation  of  a  Signal  in  Noise 

Throughout  this  section  we  will  assume  that  the  desired  operation  D(9) 

J. 

is  fixed  and  bounded.  It  is  clear  from  Section  2  that  the  solution,  H(9), 
to  the  minimum-MSE  estimation  problem  (3.4)  depends  entirely  on  the  signal 
and  noise  PSD's,  f  and  f  .  As  we  discussed  in  Section  1,  the  spectra  we 
choose  for  designing  H  may  differ  somewhat  from  the  true  signal  and  noise 
spectra.  We  model  this  spectral  uncertainty  by  choosing  appropriate  classes 
of  PSD's,  J  and  71,  and  assuming  that  fg  €  J  and  f  €  7[,  in  other  words. 
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we  know  that  fg  belongs  to  J  and  that  belongs  to  71  ,  but  we  do  not  know 
which  elements  of  J  and  71  represent  the  true  signal  and  noise  spectra. 

Clearly  any  transfer  function  ^  for  which  there  exists  a  finite  upper 
bound  on  the  MSE,  e^(f g,f^;H^)  over  all  fg  in  J  and  f^  in  7[  ,  could  be 
termed  robust  since  a  certain  level  of  performance  can  be  guaranteed  by  using 
H  .  (That  level  being:  sup  e_(f_,f„ ;H_) .)  Ideally  we  would  like  a 

R  (f  ,f  )€  j  x  n*  s  N  R 

S  N 

(causal)  transfer  function  with  the  smallest  possible  such  upper  bound,  i.e., 

*t* 

we  would  like  to  find  a  solution  to  the  game 


inf  ,  sup  e  (f  ,f  ;H  ). 

H  €  H;  (fs,fN)€^  x  71  D  S  N 


(3.5) 


As  in  Chapter  II,  we  refer  to  as  a  most- robust  causal  transfer 
function  for  the  spectral  uncertainty  classes  J  and  7[. 


We  now  give  some  specific  forms  for  the  uncertainty  classes*^  and  71  . 
These  forms  have  been  widely  used  to  model  uncertinty  in  both  the  engineering 
and  statistics  literature.  We  will  exhibit  these  forms  for  the  class  J 
but,  of  course,  they  could  just  as  easily  be  used  to  model  noise  spectral 
uncertainty. 

Certainly  the  most  commonly  used  uncertainty  class  ( [6] , [ 7 ] , [ 14] , [ 20] , 
[23] , [ 32] , [ 37] , [39] , [40] , [43] , [44] , [ 46 ] — [ 48 ] )  is  the  e-contaminated  model 
(also  called  the  e-mixture  or  gross-error  model) .  It  has  the  form 
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S,  -  {fs|fs(0)  -  (l-e)f°(8)+  f;(e)V06[-ir.iT]# 

j*  f'(9)dX(0)  -  J*  fg(0)dX(8) }  (3.6) 

-ir  -ir 

where  is  a  nominal  PSD  and  e  (0  <  e  £  1)  is  the  contamination  parameter, 
s 

This  class  is  probably  the  most  popular  for  representing  uncertainty  because 

it  models  the  idea  that  we  have  e  of  completely  general  uncertainty  about 

our  choice  of  the  PSD  f^  . 

s 

Another  common  model  ([2] , [61 , [14] , [20] , [44])  is  the  total  variation 
model  which  has  the  form 

STV  "  {fsl  £J‘ir|fs(e>-£a<0)ldX(0)  -£ 

“  TT 

I  fg(0)dX(0)  =  J*V(6)dX(0)}  (3.7) 

-ir  -it 

where,  again,  f^  is  a  nominal  PSD  and  c  an  uncertainty  parameter. 

A  third  model  is  the  band  model  ([1] , [21] , [22])  which  has  the  form 

TT 

SB  "  {fslfs(6)  -  fs(6)  -  fs(9)  Y-0’  J*  fs<9)dx(e>  -  2™>  (3-8) 

-TT 

fi1T  l  •»7r  U 

where  j  fg (0)dA (0)  <_  2irw  <_  J  fs(9)dX(8)  and  w  is  the  (known)  power  of  the 

-IT  -IT 

signal.  The  name  band  model  comes  from  the  idea  that  f^  and  fU  are  the 

s  s 

lower  and  upper  bounds  of  a  confidence  band  around  a  spectral  estimate. 

The  last  model  is  the  p-point  (or  Sakrison's  class  b)  model  ([5],  [8], 
[25] ,  [41] , [42] )  which  has  the  form 


i"l . n} 


(3.9) 


SP  ^  ( f g I  j*  fg(0)dX(6)  -  2irwi  , 

Ai 

c 

where  the  A  's  are  a  partition  of  [— tt ,tt]  and  E  w.  ■  w  ,  the  power  of  the 
1  i-1 

signal.  A  p-point  class  is  an  appropriate  model  of  uncertainty  in  situa¬ 
tions  where,  for  example,  we  can  accurately  measure  the  power  p^  in  each 

interval  A.  ■  [9  ,8.]  (where  -ir  ■  9_  <  6-  <  ...  <  0  ■  ir)  using  a 

i  i-1  i  0  1  n 

nested  bank  of  low  pass  filters  or  a  bank  of  band  pass  filters.  Note  that 
unless  n  is  quite  large  we  are  allowing  a  considerable  amount  of  uncertainty 
when  we  use  a  p-point  class.  It  might  be  more  reasonable  to  obtain  also 
some  other  form  of  spectral  estimate  and  use  a  band  model  in  addition 
(i.e.,  let  our  class  be  the  intersection  of  a  p-point  model  with  a  band 
model).  We  call  this  a  banded  p-point  model. 

We  note  that  for  each  of  these  classes  we  have  assumed  that  the  power 
is  known.  Often  it  is  a  reasonable  assumption  that  the  power  can  be 
accurately  estimated  even  though  the  shape  of  the  PSD  is  uncertain. 
Furthermore,  in  all  the  specific  cases  in  which  we  have  found  most-robust 
filters  (see  Section  5)  it  turns  out  that  they  do  not  depend  on  the  specific 
signal  and  noise  powers  (w^  and  w^,  respectively)  but  only  on  the  input  (to 
the  filter)  signal-to-noise  ratio  w^/w^. 

Returning  now  to  the  definition  of  a  most-robust  filter  (3.5)  for 
general  J  and  71  ,  we  note  that,  while  the  minimum-MSE  transfer  function  is 
known  to  exist  for  each  (f  ,f  )  €*>  x7 1,  this  is  no  guarantee  that  the 
infimum  in  (3.5)  is  achieved.  In  fact,  it  need  not  even  be  finite.  Of 
course,  even  if  it  were  not  achieved  there  would  still  exist  transfer 
functions  whose  worst-case  MSE  over  J  x  7[  would  be  arbitrarily  close  to 


this  infimum. 


In  Che  following  theorem  we  give  some  mild  sufficient  conditions  for 


the  infimum  In  (3.5)  to  be  achieved  and  finite. 

Theorem  3.1.  If  the  spectral  uncertainty  classes  J  and  71  are  such  that 
the  following  two  conditions  hold 

1  w 

1)  sup  —  J*  fg(e)dx(8)  -Wg  <  « 

tf  “IT 


11)  Either  (a)  or  (b)  holds  for  some  e  >0: 

a)  There  is  an  f^  €  7[  such  that  fN (9>  ^  e  >  0  a.e.[X]  , 

b)  There  is  an  f  €  J  such  that  f_(0)  >  e  >  0  a.e.[X]  ; 

o  5  — 


then  there  exists 
infimum  is  finite. 


achieving  the  infimum  in  (3.5). 


Furthermore  this 


Proof.  Let  H Q(6)  =■  0,V9.  (Note  that  HQ  €  H2  .)  We  have 


inf  sup  «^(f  ;  fN;H)  _<  sup  e^f  ,f  ,H  ) 
2  J  *  7{  D  s  N 

H , 


sup  J  iD(9)|2f_(9)dX(0) 
fs€  J  -rr  S 


tt 

B_  sup  J  f_(9)dX(0) 

f„€  J  -77  S 


bd  ws  <  “ 


where  is  the  essential  supremum  of  |d|  •  Hence  the  infimum  in  (3.5)  is 
finite  and  for  any  fixed  M  satisfying  B_w  <  M  <  “  we  may  exclude  from 
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2 

consideration  any  H  €  H+  which  satisfies  e^(fg,f^;H)  >  M  for  some 
(f_,fw)€^  x 71 .  We  will  now  translate  this  to  a  bound  on  jj  H  ||  * 

b  W 

*  v  2  "\H  2 

J  | H(8)  |  dX(6)  which  i.?  the  norm  of  H  in  the  Hilbert  space  H+  . 

.  -n  J 

If  condition  (a)  of  (ii)  in  the  statement  of  the  theorem  holds 

2 

then  for  any  fg  €  J  and  any  H  €  H+  we  have 
eD(fs,fN;H>  >  ^  l  |H(0)|2fN(e)dX(e) 


2 

Hence  in  this  case  we  may  exclude  any  H  satisfying  ||  H  ||  >  2ttM/£.  If  > 

on  the  other  hand,  condition  (b)  of  (ii)  holds  then  for  any  fN  €  ^  and 
2 

any  H  €  H+  we  have 

eD(IS,fN;H)  -TZ  J*  lD(0)-H<e)l%;'0>d^9> 

—  7T 

>  (e/2ir)||  D-H  ||2 

>  (e/2ir)[  ||d||  -  ||Hj|  ]2 

Hence  we  -nay  exclude  any  H  satisfying  ||  H  j|  >  ||  D ] )  +  /2ttM/ e  . 

Thus  we  have  shown  that  if  B  >  j|  d||  v^-rB^Wg/e  then  we  have 


inf  sup  e_(f  ,f„;H)  = 

2  S  N 

H. 


inf  sup  e  (f  ,f  ;H) 

2  J  x  n  D  s  N 

H  (B) 


(3.10) 
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where  H2(B)  =  [H  €  H2|  ||  H  ||  <_  B] .  We  will  now  show  that  the  right  hand 
side  of  (3.10)  has  a  solution. 

2 

Step  1:  epCfg,^;  •)  is  continuous  on  H+  for  each  (f^.f^)  €  V  x  71  . 
We  have 


27r 'eD^fS,fN;Hn^  ~  eD^fS’fN’H^ 


<  |  j*  |D-Hnl2fsdX  -  j’|D-Hn|2fsdX|+  |J|Hni2fNdX  -  J|Hl2fNdX| 


D-H 


n ' 


D-H 


+  B, 


H 


N 


n 


where  and  B..  are  the  essential  supremums  of  f_  and  f„,  respectively. 

fs  £n  s  n 

Now  if  ||  H^-H  ||  -*•  0  then  ||(D-Hn)-(D-H)  ||  -►0.  Hence  from  the  inequality 
II  H1-H2 1|  >  |  ||  HjH  -  ||H2||  |  we  have  that  ||hJ|  -  ||  H||  and  ||  D-H^U  - 

||  D-H 1 1  .  Hence  a^fg.f^;*)  is  continuous. 

2 

Step  2:  sup  e  (f  ,f  ;•)  is  lower  semicontinuous  (l.s.c.)  on  H  . 

_  ;xr  s  N _ 

This  is  a  straightforward  consequence  of  Step  1  and  Corollary  1.1,  p.  77, 
in  [55]  which  states  that  the  supremum  over  a  family  of. l.s.c.  functions 
is  l.s.c. 

2 

Step  3:  sup  e  (f  ,f  ;•)  is  convex  on  H  .  This  is  a  straightforward 
_  J  x  7j  U  b  N _ _ 

computation. 

2 

Step  4:  Applv  Theorem  1.2,  p.  79,  in  [55],  to  sup  e_(fc,f,T;*)  on  H  (B) 

-  - x  7?-- ....  S  ,  N - ± - 

2 

This  Theorem  1.2  states  in  our  case  that  because  H+  is  a  reflexive  Banach 
space  (see  [4]);  because  sup  e^f,,,^;  •)  is  l.s.c.  (Step  2),  convex  (Step  3) 
and  proper  (this  is  trivial  since  e(f,,,f^;H)  >_  0  >  -  *>)  on  H“  :  and  because 
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H+(B)  is  convex,  closed,  and  bounded  (these  can  be  deduced  from  the  fact 

2  2 
that  H+(B)  is  just  a  multiple  of  the  closed  unit  ball  of  H+,  see  [10]); 

we  have  that  the  infimum  in  (3.5)  is  achieved  by  some  .  QED. 

As  we  noted  above,  the  conditions  of  Theorem  3.1  are  mild.  Certainly 

any  problem  in  which  there  was  no  upper  bound  on  the  signal  power  would  be 

unreasonable.  Further,  the  condition  (iia)  [resp.  (iib)]  is  satisfied  if 

7?  [resp.  J  ]  has  any  of  the  forms  discussed  earlier  (i.e.,  e -contaminated, 

etc.).  The  only  possible  nontrivial  exception  to  this  would  be  in  the  case 

of  the  p-point  class  if  some  w^=0  while  X([9._^,6.])  >  0. 


We  now  turn  our  attention  to  the  problem  of  finding  ^  ,  the  most 
robust  transfer  function.  We  begin  with  a  definition. 

Definition  3.1.  A  pair  of  PSD's  is  least  favorable  for  causal 

estimation'*'  for  the  uncertainty  classes  J  and  7[  if 


L  L  J" 

e  (f  ,f  )  =  max  e  (f  ,f  ) 

D  S  N  J  x  71  D  S  N 


where  eD(fS’V’  t'ie  minimuTn-MSE  Eor  (fgjf^).  is  defined  in  (3.4). 
Note  that  (3.11)  means  that  (f^,f^)  solves  the  maximin  game 

O  IN 


max  min  e  (f  ,f  ,H) 
J  X  n  2  DSN 

H+ 


(3.11) 


(3.12) 


Hence,  if  the  minimax  equality  holds  here  (i.e.,  if  (3.12)  equals  (3.5)) 

^This  definition  differs  from  the  ones  given  in  [1]  and  [2]  (see  Chapter  2) 
but  is  consistent  with  earlier  notions  of  least  favorability  (see,  for 
example,  [26],  p.  34).  As  was  pointed  out  by  Verdu  ([54],  p.  72),  this 
discrepancy  seems  to  have  had  its  origins  in  a  somewhat  confusing  discussion 
in  [14]. 
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o 


c 

a 


then  is  a  least  favorable  pair  if  and  only  if  (f^,f?J)  and  its 

b  N  b  N 

4* 

optimal  causal  transfer  function  form  a  saddle  point  solution  to  the 
game  (3.5)  (or,  equivalently,  the  game  (3.12));  that  is  and 

satisfy 


aD^S -  eD^^S*^N’^L^  -  eD^fS,fN’H^ 


(3.13) 


(For  a  clear  and  thorough  discussion  of  this  point  see  [55],  especially 
Section  2.3.1.)  Clearly,  if  (3.13)  holds  then  is  a  most  robust  causal 
transfer  function. 

Our  next  theorem  gives  some  conditions  under  which  the  optimal  transfer 
function  for  a  least  favorable  pair  is  most  robust.  This  is  useful  because 
it  is  often  easier  to  solve  the  maximization  problem  (3.11)  than  it  is  to 
solve  the  minimax  game  (3.5). 

Theorem  3.2.  If  the  spectral  uncertainty  classes  J  and  71  are  such  that  the 
following  three  conditions  hold 

1  / 

i)  sup  j  fs(e)dA(9)  -  wg  <  ®  . 

-IT 

ii)  J  and  71  are  convex. 

iii)  At  least  one  of  (a)  or  (b)  holds  for  some  e  >  0. 

a)  Every  fN  €  71  satisfies  f^(Q)  e  >  0  a.e.  [X]  . 

b)  Every  f^  €  J  satisfies  f ^ C 0 )  >_  e  >  0  a.e.  [X]  . 

then  a  pair  of  PSD's  (f^.f^j)  in  J  x  7[  and  its  optimal  causal  transfer 
function  form  a  saddlepoint  solution  to  the  minimax  game  (3.5)  if  and 
only  if  (fg*f|p  is  least  favorable  for  causal  estimation,  i.e.,  solves 
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(3.11)  (note  that  in  this  case  is  a  most  robust  causal  transfer  function) 
Proof:  We  will  need  the  following  lemma  (which  is  Theorem  2  in  [11]) 

Lemma  3.1:  Let  X  be  a  compact  Hausdorff  topological  space  and  Y  an 
arbitrary  set.  Let  F  be  a  real-valued  function  on  X  x  Y  such  that  for 
every  y  €  Y  F(x,y)  is  l.s.c.  on  X.  if  for  each  y  €  Y  F(x,y)  is  convex 
on  X  and  for  each  x  €  X  F(x,y)  is  concave  on  Y  then 


min  sup  F(x,y)  =  sup  min  F(x,y).  (3.14) 

x€X  y€Y  y€Y  x€X 

2  2 

We  wish  to  apply  Lemma  3.1  with  X  =  H+(B)  (H+(B)  was  defined  in  the 

proof  of  Theorem  3.1),  Y  X  71  ,  and  F(x,y)  =  F(H,(fs>fN))  =  e^fg.f^H). 

2  2 

Since  H+  is  a  reflexive  Banach  space  H+(B)  is  compact  in  the  weak  topology 

(see  [10],  Chapter  V  ,  especially  Theorem  V.4.7,  or  [55],  Sections  1.2.2 

and  1.2.3).  Furthermore  we  saw  in  the  proof  of  Theorem  3.1  that 

2 

is  continuous  (hence  l.s.c.)  in  the  norm  topology  of  H+(B)  and  that 

2 

eD^S’^N’*^  convex  an<^  Proper  on  H+(B).  Thus  by  Proposition  1.5  of 

2 

[55]  we  have  that  e^(fg,f^;»)  is  l.s.c.  in  the  weak  topology  of  H+(B) .  The 
final  condition  of  Lemma  3.1  is  that  e^(*,*;h)  is  concave  on  J  x  71  .  But 
this  is  trivial  since  it  is  even  linear.  Thus  we  have  shown  that  for  any 
B  >  0  we  have 


min 

H^(B) 


?UP„eD(£S'fNiH) 

J  x  71 


sup 

a?  X  Tk 


min 

H^(B) 


eD(fS’fN;H) 


(3.15) 


Note  that  since  the  conditions  of  this  theorem  are  stronger  than  the 
conditions  of  Theorem  3.1  we  have  from  the  proof  of  Theorem  3.1  that  if  B 
is  large  enough  then  the  left  hand  side  of  (3.15)  is  equivalent  to  (3.5). 


« 
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We  will  now  show  that  the  right  hand  side  of  (3.15)  is  equivalent  to  the 
right  hand  side  of  (3.11). 

If  (say)  condition  (a)  of  (iii)  holds  than  for  any  (fg,f^)€»^x  7?and 
2 

any  H  €  H+  we  have  that 

VWH>  is  J-  W2VX 


>  (e/2ir)  j*  |  H  [  2dX 
=  (e/2tt)  ||  H  ||2 

Hence  for  any  B  >  v^ire^^g.fjp/e  we  have 

min  eD(fs>fN;H)  =  min  eD(fs»fN5H) 

H2  H2(B) 

Furthermore,  if  B > v2v  sup  el(f  ,f  )/e  we  see  that  the  right  hand  side  of 

j  x  n  D  s  N 

(3.15)  equals  the  right  hand  side  of  (3.11).  We  note  that  sup  :*'s 

U  O  IN 

always  finite  under  the  conditions  of  Theorem  3.1  (hence  under  those  of 
Theorem  3.2)  because  it  is  always  less  than  or  equal  to 

min  sup  e_(f  ,f  ). 

hshJ*'*" 

Thus  we  have  shown  that  under  condition  (iiia)  B  can  be  chosen  so 
that  (3.15)  implies 

min  sup  e  (f  ,f  ;H)  =  sup  min  e  (f  ,f  ;H)  . 

„2  •'«*  D  s  •'■‘V  D  s  s 


Knowing  this  we  have  (see  Section  2.3.1  of  [55])  that  (3.11)  is  equivalent 
to  the  existence  of  a  saddlepoint  solution  to  the  game  (3.5)  and  that, 
in  particular,  any  solution  (fg,fj|>  to  (3.11)  and  its  optimal  causal 

form  a  saddlepoint  solution  to  (3.5).  Thus  is  a 
most  robust  transfer  function  .  Conversely,  any  pair  of  PSD's  which  together 
with  its  optimal  transfer  function  forms  a  saddlepoint  solution  to  (3.5) 
must  also  solve  (3.11). 

Finally  we  note  that  (just  as  in  the  proof  of  Theorem  3.1)  we  may 
obtain  these  same  results  if  condition  (iiib)  holds  instead  of  (iiia)  and 
the  theorem  is  proved. 

While  the  conditions  of  Theorem  3.2  (especially  (iii))  are  not  as 
innocuous  as  those  of  Theorem  3.1,  they  are  satisfied  by  any  e-contaminated 
(see  (3.6))  whose  nominal  PSD,  f^,  satisfies  f^(0)  >_  6  a.e.  [X],  for  some 
5  >  0,  by  any  band  model  (see  (3.8))  whose  lower  bound  f^  satisfies 
fL(6)>^(5  a.e.  [X]  and  hence  by  any  banded  p-point  model  whose  lower 
bound  satisfies  this  condition.  This  is  not  an  unreasonable  condition 
since  it  only  need  apply  to  one  of  J  and  71,  not  both.  Generally  the 
noise  will  be  wide-band  with  respect  to  the  signal  and  this  condition  will 
be  satisfied  by  71.  Alternatively,  we  might  assume  there  is  a  small  white 
component  to  the  noise.  This  added  component  is  sometimes  called  "noise 
floor"  and  has  been  used,  for  example,  by  Van  Trees  to  make  the  problem  of 
detection  in  nonwhite  noise  analytically  tractable.  See  [57],  Section  4.3, 
for  further  justification. 

The  benefit  of  Theorem  3.2  stems  from  the  fact  that  the  maximization 
in  (3.13)  should  be  easier  to  solve  than  the  original  minimax  problem  (3.5). 
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In  general,  this  will  only  be  so  If  we  have  a  closed-form  expression  for 
e^Cfg.f^)  as  (fg,fN)  ranges  over  J  x  7 u  Such  expressions  have  been  found 
in  a  variety  of  cases  by  various  researchers  (see  [3],  [4],  [13] , [15]— [19] , 
[33] ,[34]).  In  particular  Snyders  [3]  has  developed  a  general  method  for 
finding  such  expressions  when  one  of  fg  and  f^  is  fixed  and  rational  while 
the  other  is  completely  arbitrary. 


4.  Robust  Noiseless  One-Step  Prediction 

In  this  section  we  consider  in  greater  detail  the  special  case  of 

prediction  one  step  ahead  of  a  signal  with  uncertain  spectrum.  The 

signal  is  assumed  to  be  received  in  a  noiseless  environment.  In  other 

words,  we  consider  the  problem  (3.5)  with  D(0)  ■  e  and  f„(0)  *  0 

N 

for  all  0  €  [—• tt  , tt ] .  This  special  case  is  the  one  considered  previously  by 
Hosoya  [23].  The  conditions  on  our  main  theorem  (Theorem  3.2)  when 
applied  to  this  case  are  slightly  more  restrictive  than  those  of 
Hosoya's  analogous  result  (Theorem  2,  p.  581  in  [23]).  On  the  other  hand, 
in  [23]  only  the  e-contaminated  class  (see  (3.6))  is  considered  and  the 
proofs  directly  depend  on  the  specific  form  of  the  least  favorable  PSD 
for  this  class,  whereas  our  treatment  is  valid  for  more  general  uncertainty 
classes . 

For  a  known  signal  PSD,  fg(9),  the  one-step  minimum-MSE  noiseless 
prediction  problem  is  given  by 


eD^fS,C0  =  min  e^fg.OjH) 


(3.16) 


_i9  +  2 

where  D(9)  =  e  and  e^,  and  H+  are  defined  in  Section  2.  Note  that 


The  SzegB-Kolmogorov-Krein  Theorem  [3], [4]  states  that  (with 
D(8)  ■  e  ^  )  we  have 

e^(fg.O)  -  exp  f J*  log  fg(e)d0  (3.17) 

where  the  right  hand  side  is  interpreted  as  zero  if  log  fg  is  not 
integrable.  (Technically,  the  fg  in  (3.17)  is  the  density  of  the  absolutely 
continuous  part  of  fg(0)dX(0)  with  respect  to  Lebesgue  measure  on  [-ir.ir]. 
However,  since  the  case  of  greatest  interest  is  when  X  is  Lebesgue  measure 
and  since  any  predictor  can  be  adjusted  to  perfectly  predict  the  singular 
part  of  any  signal  spectrum  (see  [18])  we  assume  here  that  fg  is  just  the 
usual  (Lebesgue)  PSD) . 

We  now  consider  the  case  where  the  signal  spectrum  is  uncertain.  As 
in  Section  3  we  assume  that  we  know  only  that  fg  £  J,  and  we  wish  to  find 
H^  solving 


min  sup  eD(fs,0;H) 


(3.18) 


—i.9  t 

where,  again,  D(9)  ■  e  .  In  this  section  we  refer  to  as  the  most 

robust  linear  predictor.  Also,  we  define  fg  to  be  least  favorable  for 

one-step  noiseless  prediction  for  J  if  (fg,0)  satisfies  Definition  3.1  with 

W.  =  {0}.  From  (3.17)  we  see  that  fg  is  least  favorable  for  one-step 

noiseless  prediction  for  J  if  and  only  if 


J  log  fg(0)d0  «  max  J  log  f_(6)d0  . 


(3.19) 


-IT 


-IT 


Furthermore,  under  the  conditions  of  Theorem  3.2  we  have  that  fg  solves 


(3.19)  if  and  only  if  fg  and  h£  form  a  saddlepoint  solution  to  (3.18), 


where  is  the  optimal  linear  predictor  for  fg.  Thus,  if  we  can  solve 


(3.19)  then  we  can  find  the  most  robust  linear  predictor  for  J  . 

We  will  now  demonstrate  how  to  obtain  the  exact  form  of  a  least 
favorable  spectrum  for  each  of  the  uncertainty  classes  discussed  in  Section 
3.  The  method  we  use  involves  an  analogy  to  robust  hypothesis  testing. 

The  advantage  of  this  is  that  it  allows  us  to  make  use  of  the  considerable 
effort  already  expended  in  finding  least  favorable  probability  densities 
for  that  problem  (defined  below) .  This  analogy  was  the  underlying  basis 
for  the  solutions  given  in  [1]  and  [22]  and  was  developed  explicitly  in 
Section  III  of  [2]  (see  Chapter  II). 

As  in  each  of  the  classes  of  Section  3,  we  will  assume  that 


satisfies  a  power  constraint,  i.e.,  that  J*  fg(9)d9  *  2iTWg,  w^ere 


0  <  Wg  <  °°  is  the  power  in  the  signal  process  (S(k)}.  We  can  now  define 


a  class  of  probability  densities  on  [ — -rr , it ]  by 


#S  =  t  PS  (9)  |  Pg  (6)  =  fg(9)/2Wg,fg  ^  j  } 


(3.20) 


and  consider  the  following  pair  of  statistical  hypotheses  concerning  a 
random  variable  X  on  [-i tt,tt]  and  its  Borel  a-field 


H 


0 


X  ~ 


P(Q)  =  'f  0  €  [-Tr.TT] 


versus 


(3.21) 


For  any  test  <(>  of  versus  Hq  define  Rj($;P)  to  be  the  conditional  risk 
of  using  $  when  X  —  P  under  (j*0,l)  (see  [26]).  Consider  the  following 
(see,  for  example,  [26]). 

Definition  2:  For  the  hypothesis  pair  in  (3.21),  fg  €  ^  is  least 

favorable  in  terms  of  risk  for  ^  versus  {l/2ir}  if 

Ri (<*>'. ps)  iR1(<(',»qs)  Ps  € 

for  every  probability  ratio  test  between  qg  and  l/2ir. 

Least  favorable  densities  play  an  important  role  in  the  design  of 
robust  hypothesis  tests  (see  [14],  [6]).  They  have  been  found  for  the 
e-contaminated  and  total-variation  models  ((3.6)  and  (3.7),  respectively)  b. 
Huber  [14],  [27];  for  the  band  model  (3.8)  by  Kassam  [21];  and  for  the 
p-point  model  (3.9)  (see  Chapter  V).  Thus  the  importance  of  the  following 
which  is  related  to  Lemma  1  of  [2],  Theorem  2  of  [21]  and  results 
in  [29]  and  [30], 

Proposition  3.1:  Let  <p  be  any  differentiable  concave  function  on  (0,°°). 

Let  be  any  convex  set  of  probability  densities  on  [  — tt , tt ]  .  If  q  €  9 
S>  S15 

is  least  favorable  in  terms  of  risk  for  versus  the  uniform  distribution 
on  [  — tt , tt ]  then,  for  all  Pg  €  9^  we  have 


j'cp(qg(e))d9  >_  Jc3(ps(8))de  . 


(3.22) 
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Proof:  We  wish  to  show  that  f  cP(Po(9))d9  achieves  a  maximum  over  *s  «  ’s' 
Since  J  cp(*)  is  concave  we  only  need  to  show  that 

j^[  Jcp[(l-e)qs(9)  +  eps(e)]d0]e=o  <0  ,  (3.23) 

V  p  €  {?  The  left  hand  side  of  (3.23)  is 

b  O 

<p[(l-e)q  (6)  +  ep  (6)]  -  cp(qo(6)) 

lim  f  - - - 5 - - -  de  (3.24) 

e+0  e 

But,  by  the  concavity  of  <?,  as  e4-0  the  function  inside  the  integral  in 

(3.24)  converges  pointwise  up  to  cp*  [ qg  (0 )  1  [p^  (9 )  -  qg(9)].  So  the  left  hand 

side  of  (3.23)  is  less  than  or  equal  to  J  cp’  [ qg (0 ) ] [pg (0)  -  qs(8)]de  .  This 

latter  term  is  nonpositive  since  (by  concavity)  cp’  is  nonincreasing  and  qg  is 

the  distribution  making  q_  stochastically  smallest  over  (see  [14],  [27]) 

b  b 

therefore  implying 

J  <p’ (qs(9))ps(e)de  <_  J  w'  (qs(e))qs(e)de  . 

This  concludes  the  proof  of  Proposition  3.1. 

From  this  proposition  and  Section  3  in  [14]  we  have  that  if  J  is  an 
e-contaminated  model  (i.e.  has  the  form  (3.6))  then 


where  fg  is  the  nominal  PSD  in  (3.6)  and  the  constant  c'  can  be  determined 
so  that 

l  fg(e)d9  =  .r  f°(0)d9  . 


2 


Note  that  this  agrees  with  the  result  given  in  [23].  Similarly  fg  may  be 
found  for  the  other  uncertainty  classes  discussed  in  Section  3. 

IT 

We  also  note  that  -[J*  cp(p(6))d0],  as  in  Proposition  3.1,  may  be 

-IT 

thought  of  as  a  "measure  of  the  distance"  between  Pg  and  Lebesgue  measure 
on  [— Tr,ir].  -  Jcp(*)  is  related  to  a  class  of  divergences  sometimes 
referred  to  as  Ali-Silvey  distances  (see  [29]-[31],  [21]).  Thus, 

Proposition  3.1  says  that  fc  is  an  element  of  J  which  is  "closest"  to 
Lebesgue  measure.  This  makes  fg  of  (3.25)  intuitively  appealing  since  it 
is  the  "flattest"  element  of  J . 

Another  interesting  fact  about  fg  is  that  it  is  a  least  favorable 
spectral  density  for  the  problem  of  calculating  the  rate-distortion  function 
of  the  class  of  discrete-parameter  homogeneous  Gaussian  sources  whose 
spectra  are  contained  in  an  e-contaminated  class  (see  [32]). 

5.  Robust  Filtering  in  White  Noise:  Numerical  Results 

Abstractly,  the  significance  of  robust  signal  estimation  is  clear: 
to  be  able  to  put  the  tightest  upper  bound  on  the  error  when  the  possibility 
of  deviation  from  the  assumed  spectra  exists  is  clearly  desirable.  However, 
as  we  discussed  in  Chapter  II,  in  most  situations  we  must  also  expect  that 
the  robust  estimator  will  not  perform  as  well  as  the  assumed  (or  nominal) 
estimator  if  the  true  spectra  are  the  nominal  spectra.  So  there  is  a 
trade-off.  Thus  the  questions  that  naturally  arise  are  how  much  is  gained 
by  the  robust  estimator  in  its  worst  case  (at  (fg,f^))  as  compared  to  the 


nominal  estimator  at  its  worst  case  and  how  much  is  lost  in  using  the 
robust  estimator  should  the  true  spectra  be  the  nominal  ones.  Clearly  a 
blanket  statement  of  the  superiority  of  one  estimator  over  the  other  in  all 
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cases  is  not  likely  to  prove  correct.  Thus,  we  consider  these  questions  for 
two  numerical  examples. 

Specifically,  we  consider  robust  filtering  of  a  first-order  Markov 

W 

signal  in  white  noise.  It  has  been  shown  [33],  [34]  that  ,  if  f^  is  the 

W 

spectrum  of  white  noise  (i.e.,  f^(0)  -  Nq  for  some  positive  constant  N^) 
and  if  D(0)  represents  filtering  (i.e.,  D(e)  =  1),  then 


ei(fS,fN}  "  N0{1  "  6X15  17  log(l+NQ1fs(9))d0]  }  (3.26) 


for  any  signal  PSD  fg.  In  view  of  Eq.  (3.26),  the  results  of  Section  3  and 
Proposition  3.1  in  Section  4  can  be  applied  to  any  convex  signal  uncertainty 
class  J.  In  particular,  for  normalized  classes,  results  from  robust 
hypothesis  testing  can  be  used  to  obtain  least  favorable  signal  spectra  and 
hence  most-robust  filters. 

For  a  wide  variety  of  applications  it  is  appropriate  [35]  to  assume 
that  the  noise  is  white  and  that  the  signal  is  first-order  Markov, 
i.e.,  has  PSD  f?  where 

v) 


f°<6)  =  -  r2>»S 

S  2  * 

1  -  2rcos9  +  r 

for  some  r  e[-l,l].  A  process  with  this  spectrum  has  power  w 

J 

r  >_  3  -  2>'/3  **  0.172,  it  has  3  dB  power  bandwidth 


(3.27) 

and ,  f  or 


cos 


-1  [  r  :2:r  +  1  ] 


(3.28) 


Substituting  the  expression  for  f^  into  (3.26)  we  obtain  e.(f^,f^). 

b  1  S  N 
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Alternatively,  since  fg(8)  is  rational,  we  can  determine  that  the  optimal 

0  W 

causal  transfer  function  for  the  nominal  pair  is  given  by 


Hj(e) 


K 


1  -  ae 


-;.e 


6  €  [— TT  ,TT  ] 


where 


(3.29) 


aWgd-r2) 

K  "  NQ(l-ra) 


and 


b 


w  (1-r  ) 


N, 


+  (1+r  ) 


o  w  tow 

We  can  then  substitute  f^,  fN>  and  (3.29)  into  (3.3)  to  obtain  e.^(fg,  fjj)  • 


As  in  Chapter  II,  we  use  a  measure  of  performance  in  the  figures  which 
we  refer  to  simply  as  output  signal-to-noise  ratio  (SNR) .  The  purpose 
of  Wiener-Kolmogorov  filtering  is  to  minimize  the  MSE,E{|s(t)  -  S(t)  |  }, 
between  our  estimate  S(t)  (i.e.,  the  output  of  the  filter)  and  the  actual 
signal  S(t).  Since  the  output  of  the  filter  can  be  written  as  S(t)  + 

(S(t)  -  S(t)),  we  use  the  signal  power  divided  by  the  MSE  as  an  output  SNR. 
For  the  purpose  of  our  graphs  we  translate  this  to  dB.  The  horizontal 
axis  is  10  log^fag/N  ) ,  the  input  SNR  in  dB. 

The  top  line  in  Fig.  8  gives  the  performance  of  the  filter  Hq,  given 

0  w  0  w 

in  (3.29),  (which  is  optimal  for  the  pair  (f  ,f  ,))  when  f  and  f  are,  in 

b  N  S  N 

fact,  the  signal  and  noise  spectra  which  occur.  In  Fig.  8,  the  signal 
bandwidth  is  0.103. 
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Suppose,  now,  that  we  are  not  completely  certain  about  our  choice  of 

signal  spectrum.  In  particular,  assume  we  know  only  the  total  signal 

power  Wg  and  that  the  fractional  power  w^  on  the  set  A^=  { 0  |  |  0  |  <0. 125}  is 

given  by  w^=  0.555  w^.  In  other  words,  we  are  modeling  our  uncertainty 

about  the  signal  by  defining  as  the  p-point  class  (3.9)  with  and  w^ 
c 

as  above  and  A2=A1  anc*  w2=  0*^5  Wg. 

If  all  we  know  is  that  f  £**  then  we  would  like  -to  know  how  badly  the 

X 

performance  of  can  deteriorate.  The  bottom  curved  line  in  Fig.  8  gives 
the  worst  case  performance  of  Hq.  The  straight  line  in  Fig.  8  was  included 
to  show  how  bad  this  deterioration  is.  It  gives  the  performance  of  an 
all-pass  filter  (i.e.  Hsl). 

So  we  see  that  the  performance  of  the  optimal  or  nominal  filter 
can  deteriorate  by  as  much  as  3  dB;  and,  for  input  SNR  above  6  dB,  its 
performance  can  be  significantly  worse  than  no  filtering  at  all.  Thus 
there  is  a  clear  need  for  robust  filtering. 

Applying  Proposition  3.1  with  ">(x)  =  log(l  +  x/Nq)  (see  (3.26)),  it  can 
be  shown  (see  Chapter  V)  that  fg  is  least  favorable  for  causal  filtering 
for  the  uncertainty  classes  J  and  71  =  { if  f^  is  given  by 


fs(9>  - 


2ttw1/0.25 


2uw  /(2ir  -  0.25) 


for  9  €  A. 


for  5  (  A. 


(3.30) 


It  is  clear  that  the  hypotheses  of  Theorem  3.2  (specifically  i,  ii, 
and  iiia)  are  satisfied  for  this  J  and  71.  Thus,  the  most  robust  filter 


IIR  is  the  optimal  filter  for  (f^.f^).  It  can  be  seen  from  [16]  and  (3.29) 

x  2 

that  jl  -  H^(0) I  is  constant  on  and  on  •  Since  every  element  of  J 


t 
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W 

has  the  same  power  on  and  on  A^»  the  error  e^(fg,f^;H^)  is  constant 
over  all  fg  €  J  ;  in  particular,  =  ei^fs,fN;HR^ »  for  a11 

fg  €  J  ,  and  we  can  calculate  e^(fg,f^)  using  (3.26).  The  constant  (over-O 
performance  of  is  given  by  the  second  line  from  the  top  in  Fig.  8. 

It  is  clear  from  Fig.  8  that,  unless  we  are  very  confident  about  our 
original  choice  of  signal  spectrum,  fg,  the  robust  filter  is  preferable 
to  hJ. 


For  our  second  example,  we  assume  that  we  have  again  chosen  fg 

W 

(first-order  Markov  signal)  and  f  (white  noise)  as  our  signal  and  noise 

spectra;  but  instead  of  assuming  we  know  the  fractional  powers,  as  in  the 

previous  example,  we  assume,  now  that  we  have  a  general  sense  of  uncertainty 

W 

about  our  choice  of  signal  spectrum,  i.e. ,  we  assume  that  7?  =  {f  }  and  that 
J  is  an  e-contaminated  class  (see  (3.6))  with  nominal  spectrum  fg  given 


by  (3.27). 

W  + 

Of  course,  e^(fg,fN;Hg)  is  the  same  as  in  the  previous  example;  in 

tOW  W  + 

particular,  e^(fg»fjj)  is  c^e  same.  From  the  expression  for  e^(fg,f^;Hg) 

WC  W  t  w  x 

we  can  calculate  e^(fg,  fjjjHg)  =  SUP  el^s’^N’H0^"  eas^  to  see 

fS^ 

that  f^C  (WC  stands  for  worst  case)  is  given  by  (l-s)f^  + c f '  ,  for  any 
b  b  b 

fg  satisfying  fg(0)  =  2irw^ (9+tt)  where  w^  +  =  wg,  since 

|l  -  H_  (— tt)  |  =  |l  -  H  (tt)|^  =  sup  jl  -  h'(9)|^.  (Of  course,  these  f' 

U  U  r  l  ^ 

9€  [— tt  ,71- J 

are  not  actual  PSD's  but  since  we  can  get  arbitrarily  close  to  their  value 

w  t 

in  e ( f g  ,  f ^ ; H q )  using  the  usual  limit  arguments  for  dirac  delta  functions 

t  0  W  WC  w 

we  will  use  this  notation.)  The  performances  of  at  (f„,f„)  and  (f„,f,T) 

0  S  N  S  N 

are  given  by  the  top  and  bottom  curved  lines,  respectively,  in  Fig.  9. 

For  Fig.  9,  e  =  0.1  and  the  signal  bandwidth  is  0.001.  As  in  the  previous 
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example,  the  straight  line  gives  the  performance  of  an  all-pass  filter; 

and,  for  input  SNR  between  10  and  60  dB,  the  performance  of  the  "optimal" 

filter  Hq  can  be  much  worse  than  that  of  the  all-pass  filter.  Once  again, 

a  clear  need  for  robust  filtering  exists. 

In  order  to  find  f g  €  J  ,  we  can  apply  Proposition  3.1  with  the 

same  9  function  as  in  the  first  example.  Thus  we  see  that  the  least 

favorable  spectrum  fg  is  given  by  (3.25)  if  fj?  is  the  first-order  Markov 

spectral  density  given  in  (3.27). 

+  L  W 

We  can  now  calculate  e^(fg,  f^)  by  substituting  (3.25)  into  (3.26). 

0  W  t 

On  the  other  hand,  e^(f  ^f^H^)  is  more  difficult  to  calculate.  Fortunately, 
Yao  has  developed  an  expression  (see  equation  (36')  and  ff.  in  [16])  which 

as 


can  be  used  to  find 


-  H^(9)|2.  And  we  can  write  e1(fg,f^; 


slCfS,fN>  -  ~  I*  |1  -  H^(9)  1 2(fg(0)  -  f°(0))d0  .  (3.31) 


The  second  and  third  lines  from  the  top  in  Fig.  9  give  the  performance 
of  at  (fg*f^)  and  (fg*  f^j)  ,  respectively.  For  input  SNR  below  0  dB  or 

X  X 

above  60  dB  there  is  essentiallv  no  difference  between  H  and  H  ;  between 

U  K 

1*  x 

0  and  30  dB  the  insensitivity  of  makes  it  preferable  to  unless  we 

X 

are  fairly  certain  about  our  choice  of  f^;  and  between  30  and  60  dB  is 

X 

clearly  preferable  to  Hq.  We  also  note  that  above  20  dB  the  performance  of 

X 

is  the  same  as  the  all-pass  filter.  Hence,  for  high  input  SNR,  we  are 

X  X 

better  off  doing  no  filtering  at  all  than  using  or  Hq. 


The  examples  given  in  this  section  are  in  close  agreement  with  those 
for  the  continuous- time  case  given  in  Chapter  II  and  the  appendix.  Based 
on  this  experience  we  can  conclude  that  the  robust  filter  design  developed 


•Output  SNR  (d  B 


Input  SNR  (d  B) 


FIG.  9. 


e-Contaminated  Example.  (From  top  to  bottom) 


H0  at  (fS,fN);  HR  3t  ^fS,fN);  *4  at  (fS’fN^  (HR 
case;  at  (fgC,f^)  (HQ's  worst  case) (straight 
performance  of  trivial  all-pass  filter) . 


s  worst 
line  is 


In  the  preceeding  sections  we  have  modeled  spectral  uncertainty  via 
classes  of  bounded  spectral  densities.  While  this  formulation  is  quite 
general  and  allows  for  accurate  .modeling  of  uncertainty  in  most  situations 
there  are  still  a  number  of  practical  circumstances  in  which  greater 
generality  is  needed.  For  example,  in  many  situations  there  can  be  a 
contamination  of  an  assiimed  noise  PSD  by  small  amounts  of  noise  generated 
by  rotating  machinery  located  in  close  proximity  to  the  receiver.  This 
noise  is  often  best  modeled  by  sinusoids  of  random  phase  at  frequencies 
which  are  imprecisely  known.  Similarly  in,  for  example,  an  active  sonar 
system  such  contamination  can  be  present  in  the  signal  spectrum  due  to 
engine  noise  generated  by  the  target  or  due  to  jamming  (see  [56]). 

Thus,  it  is  clear  that  a  completely  general  model  of  spectral 
uncertainty  should  include  pure  tones.  This  is  modeled  mathematically 
using  spectral  distributions  or,  equivalently,  spectral  measures.  This 
means,  for  example,  that  if  the  signal  covariance  function  is  R  (k)  then 
the  spectral  measure  ms  of  the  signal  is  a  Borel  measure  on  [— tt.it] 
satisfying 

R(k)  =  J  e  dmg  (9) 

(See  [36].  Chapter  X,  Theorem  3.2).  The  spectral  measure  m  of  the  noise 


is  similarly  defined. 
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In  this  setting  (3.3),  the  MSE,  becomes 

7T  TT 

epOngnyH)  £  |  D(6)-H(8)|  2dms(6)  +  |a(0)  fdn^O)  (3.32) 

“IT 

where  D  and  H  are  the  transfer  functions  of  the  desired  and  performed 

operations,  respectively,  and  now  must  be  mean-square  integrable  with 

respect  to  nig  and  mg  +  m^,  respectively.  Furthermore  we  have  that  such  a 

2 

transfer  function  H  is  causal  if  and  only  if  H  €  H+(d(mg+  m^)(9)),  the 

2 

Hardy  subspace  of  L  (d(mg-t-m^)  (6))  (see  Section  2).  We  also  note  that  the 
minimum  MSE,  equation  (3.4),  becomes 


4  ("s-v  *„1  ,V2t4<vv(,>)vvv» 


(3.33) 


We  now  consider,  as  in  Section  3,  the  situation  where  we  only  know 
that  for  some  classes  J  and  7^  of  spectral  measures  the  true  signal 

spectrum  satisfies  m  €  J  and  the  true  noise  spectrum  satisfies  m^  €  fi.  In  this 

s  1 

t 

setting  a  most  robust  causal  transfer  function  is  a  solution  to  the  game 


inf  sup  e  (nL,m. ;H) 

H  €  JC  (ms,iy  €  x  71  N 


(3.34) 


where  JC 


H+  (d(m  +m^) (9)) .  We  do  not  need  to  consider  any  H 


(mg  JxVl  b 

not  in  K  because  if  H  €  3C  then  either  H  is  not  causal  or  for  some  (mg>mj!j) 

we  have  that  eD(mg,m^;H)  =  00  .  Similarly  to  definition  3.1  we 

define  (n^,  mj|)  to  be  a  least  favorable  pair  of  spectral  measures  for 
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t  ,  L  Ln 

•d  °V  V 


*  max 

j  x  n 


^(ms’V 


(3.35) 


where  e^m^.m^)  is  defined  in  (3.33). 

Of  course,  our  objective  at  this  point  is  to  prove  results  like 
Theorems  3.1  and  3.2  for  this  more  general  setting;  i.e.,  to  show  that 
a  most-robust  transfer  function  always  exists  and  that  if  a  least 
favorable  pair  exists  then  the  optimal  transfer  function  for  that  pair  is 
a  most  robust  transfer  function.  We  have  been  unable  to  show  that  this  is 
the  case  here.  However,  the  following  result  (Theorem  3.3)  is  in 
some  sense  symmetric  to  Theorems  3.1  and  3.2.  That  is,  this  result 
rtates  (under  a  mild  condition  on  the  uncertainity  classes  J  and  U)  that 
a  least  favorable  pair  always  exists  and  that  a  most  robust  transfer 
function  exists  if  and  only  if  the  optimal  transfer  function  for  the  least 
favorable  pair  is  also  a  most  robust  transfer  function. 

In  order  to  formulate  the  hypothesis  of  Theorem  3.3,  we  consider  J 
and  V.  as  subsets  of  the  Banach  space  3  of  Borel  measures  on  [-it, it]  (see  [4]). 
We  consider  8  to  be  endowed  with  the  topology  induced  by  C ,  the  Banach 
space  of  continuous  functions  on  [-tt.tt];  i.e.,  a  sequence 
(m^Jcg  converges  to  m  €  g  in  this  topology  if  and  only  if,  for  all 
f 

IT  TT 

J*  f(9)dm  (0)  -*  J  f  (0)dm(9)  .  (3.36) 

n 

"r  “tt 

Note  that  8  is  the  dual  space  of  <3  and  that,  in  this  setting,  this 
topology  is  called  the  C-topology  or  weak*topQlogy  on  8  (see  [24]  or  [10]). 
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In  probability  theory  a  sequence  {m^}  of  probability  measures  is  said  to 
be  weakly  convergent  if  (3.36)  holds  for  some  m€  6  (see  [9]). 

We  are  now  ready  to  present  the  main  theorem  of  this  section. 

Theorem  3.3:  Assume  the  transfer  function  of  the  desired  operation  D(9)  is 
continuous  on  [-■ ir,ir].  If  the  spectral  uncertainty  classes  J  and  71  are 
convex  and  weak*compact  then  there  exists  a  least  favorable  pair 
(mg,mj|)€  J  X  7?;  i.e.  (mg,m^)  satisfies  (3.35).  Furthermore,  a  most  robust 
transfer  function  exists  if  and  only  if  the  optimal  transfer  function  for 
(n^.m^J)  ,  ,  is  a  most  robust  transfer  function;  that  is,  if  and  only  if 

=  #max^  eD(mS>mN;HI)  <3*37> 

•r  X  7{ 

holds. 

Proof.  The  proof  of  this  theorem  is  quite  similar  to  that  of  Theorem  3.2. 

In  fact,  we  begin  by  applying  Lemma  3.1.  Let  X  *  J  x  71  (endowed  with  the 
weak*product  topology),  Y  * KTl<3 -  and  F(x,y)  =  ~eD^mS,mN’H^ '  F(x,y)  is 
clearly  convex  on  X  and  concave  on  Y,  and  X  is  compact  by  hypothesis.  We 
have  only  to  show  that  F(.,y)  is  lower  semicontinuous  on  X.  We  will 
actually  show  that  it  is  continuous.  Let  (m<.  ,m^)€  x 7?  converge  to 
(m°,  m^)  €  J  y7[  in  the  (weak*produc t)  topology  of  J  x  71.  Since  H  €  V  m  C- 

r\ 

and  D  is  continuous  by  hypothesis,  we  have  that  |  D(9)-H(9)|  ^€(3-  and 
2 

|  H(9)|  €C  .  Hence,  from  equation  (3.36)  we  have  that 

j1*  | D  (9 )  -  H (9  )  j  am* (9  )  -J  |d  (9)  -  H(9)  |  W(9)  and 

-  r  -tt 

TT  TT 

and  j*  |H(9)  |  2dm^(9)  ■*  J  |h(9)  |  2dr^(9)  .  Hence  eD(mg,m^;H)  -  eD(mg,m^;H)  , 
i.e.,  e^(.,.;H)  is  continuous  on  J  x  71.  We  can  now  apply  Lemma  3.1  to  yield 
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max  inf  e  (m  ,dl  ;H)  «■  inf  max  e_(rac,m..;H)  . 

D  s  **  3 cnc-.'xfl0  s  “ 


(3.38) 


Now,  for  each  pair  (ms>mN)€  71,  we  have  that  <3+  =  {H  €  (3|h  is  causal  }  J 


(i.e.  <3+  is  the  subspace  of  <3  spanned  by  {ein®|  n=Q,l,2, . . .})  is  dense  in 

2  2 
H+  (d(mg+m^(9)) .  We  have  that  <3+SH+  (dCnig+n^)  (9))  since  H  continuous 

on  [-ir,ir]  implies  H  is  bounded  and,  hence, 

J  |  H(9)|  2  d(ms+mN)(9)  <_  [sup  |  H(9)|  2]  (ms+mN)  Ct-rr»Tr])<  °°. 

0 

2 

That  <3+  is  dense  in  H+  (d  (mg+m^)  (9) )  now  follows  from  the  fact  that  they 
are  both  spanned  by  {e^n®|  n=0,l,2, . . . }.  This  implies  that  for  each 
(ms,mN)€»^  x7?  we  have  <3+  =  JC  Pi  <3  *  H2  (dOiig-hi^)  (9 ) )  fl  (3  and  we  have 

"  2/J,min  w  *D(VVH)‘  This  and  (3.38)  imply 
30  n  &  H+(d(ms+mN)(e)) 

inf  max  e  (m  ,ni;H)  -  max  „  min  e  (ro_,mT;H).  (3.39) 

KKZSxTl  D  J  x  71  H2(d(ms+mN)(9))  D  S  * 

But  for  any  minimax  problem  we  have  sup  inf  <  inf  sup,  hence  the  right  hand 
side  of  (3.39)  is  less  than  or  equal  to 


2  inf  max  en(rac  ~  inf  max  e_  (ra„  ,tn  ;H) 

H+(d  (mS+injl|)  (9 ) )  ^  x  7[  dsn  vnc;x?i  D  S  ^ 


(3.40) 
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This  last  inequality  holds  because  K  H  <3  c  H+(d(nig+mN)  (8 ) ) .  Since  the 
right  hand  side  of  (3.40)  equals  the  left  hand  side  of  (3.39)  we  have 

inf  max  e-On^.m^H)  -  max  ,  min  e  (m  ,in  ;H) 

V  SxTl  H^(d  (njg+m^)  (9 ) )  D  S  * 


-  max  e  (m  ,iO 
D  s  » 


(3.41) 


L  L. 


In  particular,  there  exists  a  pair  (m^  ,m^  )€«/x7?  solving  the  right  hand 

side  of  (3.41),  i.e.  (m<,L,  m^)  is  a  least  favorable  pair.  Further,  we 

see  from  (3.41)  that  a  saddle  point  exists  if  and  only  if  a  most  robust 

transfer  function  (i.e.  a  solution  to  the  left  hand  side  of  (3.41)) 

exists.  Finally,  this  implies  (see  [55], Section  2.3.1)  that  a  transfer 
*  .  t 

function  is  most  robust  if  and  only  if  it  satisfies 

V“sL-  ^  <  >  ‘  „^+v(9))V"s-VH) 


Hence,  by  the  uniqueness  of  optimal  transfer  functions  (see  [3]),  if 

t  t  t  + 

H^  is  most  robust  then  a.e.fmg+m^]  and  is  a  most  robust 

causal  transfer  function. QED 
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For  the  remainder  of  this  section  we  will  consider  a  general 
formulation  of  uncertainty  classes  involving  2-altemating  capacities 
(defined  below).  As  we  shall  see,  this  formulation  includes  most  types 
of  classes  that  have  been  used  to  model  uncertainty  in  the  robustness 
literature  and  all  such  classes  are  convex  and  weak*  compact,  i.e.  they 
satisfy  the  hypothesis  of  Theorem  3.3  regarding  «/  and  71.  Furthermore, 
least  favorable  pairs  have  been  found  for  many  of  these  classes  (in 
particular,  the  five  discussed  below)  for  the  analogous  problem  of  robust 
hypothesis  testing,  and  we  have  shown  in  Sections  4  and  5  how  these 
least  favorable  pairs  for  robust  hypothesis  testing  can  often  be  used 
to  find  pairs  which  are  least  favorable  for  robust  signal  estimation. 
Finally,  we  note  that  2-alternating  capacities  are  central  to  efforts 
being  made  to  develop  unifying  theories  of  robust  statistics  and  of 
robust  statistical  communication  (see  [37],  also  see  [6],  [7],  [32] 
and  Chapters  IV  and  V) . 

We  define  8+  =  {m  €  g|  m  is  nonnegative}  and  we  let  C7  denote 
the  Borel  <j -algebra  on  [ — tt , tt]  .  Suppose  TT[  c  We  will  be  thinking  of 

as  a  possible  uncertainty  class  of  signal  or  noise  spectral  measures. 

Thus,  it  is  not  unreasonable  to  assume  as  we  did  in  Section  3  that,  while 
we  are  uncertain  about  the  spectrum,  we  still  are  able  to  make  an  accurate 
estimate  of  the  power  of  the  process.  Thus  we  assume  m([-Tr,iT])  = 

2tw,  Vm  €  7i\ ,  where  w  is  this  constant  power. 

If  77(C5+  then  we  can  define  the  upper  measure,  v,  of  Tf[  as 


v(A)  =  sup{m(A)l  mS/77} 


for  each  A€£7  .  Clearly,  if  m([— it ,tt] )  =  2ttw  Vm €77!,  for  some  w  <  «,  then  v 
satisfies 


V  (<j>)  =  0,  v([-1T  ,7r]  )  <  “ 

(3.42) 

A<=  B  •  v(A)  -  v(B)  , 

(3.43) 

A  +  A  =»  v(A  )  +  v(A)  . 
n  n 

(3.44) 

Of  course  we  actually  have  v([-7t,tt])  *2tw.  if  ^  is  also  weak*  compact 
then  v  satisfies  ([6],  Lemma  2.3) 

F  4-  F,  F  closed  =»  v(F  )  4  v(F) .  (3.45) 

n  n  n 

Any  set  function  v  on  Cl  satisfying  (3. 42) -(3. 45)  is  called  a  (Choquet) 
capacity  [12].  If  v  further  satisfies 

v(AUB)  +  v(AHB)  <_  v(A)  +  v(B)  (3.46) 

then  v  is  called  a  2-altemating  capacity. 

For  a  2-alternating  capacity  v  we  define 

m(A)  <_  v(A),  Va  €  C?,m(  [-tt,tt]  )  =  v  ( [ — tt  ,  tt  ] )  } .  (3.47) 

Robust  noncausal  (infinite-lag)  smoothers  have  been  developed  for  classes 
of  this  form  in  [7];  and,  in  [32],  a  method  of  finding  the  rate-distortion 
function  for  classes  of  discrete-parameter  homogeneous  Gaussian  sources 
whose  spectra  belong  to  a  class  of  this  form  is  developed.  Also,  classes 
of  probability  measures  having  the  form  (3.47)  were  considered  in  detail 
by  Huber  and  Strassen  [6]  as  uncertainty  models  for  robust  hypothesis 
testing.  Most  importantly  for  our  purposes,  it  was  shown  in  [6]  that 
is  weak*  compact  (note  that  ^  is  also  clearly  convex) ,  that  the  upper 
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measure  of  T{  is  v,  and  that  three  commonly  used  uncertainty  models 
v 

have  the  form  (3.47). 

The  first  of  these  models  is  the  e-contarainated  model 

01  -  {m€5jm(A)  =  (l-e)m°(A)+  m,(A),VA^;m°([-Tr,Tr])=m' ([-^,^l);m'€/?+}  (3.48) 

where  e  €  [0,1]  and  m^  is  a  nominal  measure.  This  has  the  form  (3.47) 

with  v(A)  =  (l-e)m^(A)  +  em^( [-tt,tt])  for  A  f  <p.  Let  X  be  a  finite  Borel 

measure  on  [-ir.ir]  such  that  m^<<X  and  dnP/dX  is  bounded  a.e.  [X],  and 

consider  the  following  set  of  spectral  densities  {f  =  dm/dxj  where  m€77{ 

such  that  m<<X  and  f  is  bounded  a.e.  [X]}.  Clearly  this  is  nothing  more 

than  the  e-contaminated  class  J  of  PSD's  defined  in  (3.6)  with 

£ 

f^  =*  dmU/dX.  Similarly  classes  of  measures  can  be  defined  to  correspond 

J 

to  the  total  variation  class  gi<ren  in  (3.7),  the  band  model  given  (3.8), 
and  the  p-point  model  in  (3.9).  It  was  shown  in  [6]  that  the  total 
variation  class  of  measures  can  be  generated  by  a  2-alternating  capacity 
having  the  form 

v(A)  =  min  {m°(A)  +  em^( [--n-.Tr])  ,  m^([-^,-])} 
for  A  f  <J>. 

It  will  be  shown  in  Chapter  IV  that  the  band  model  can  be  generated 
by  a  2-alternating  capacity.  In  fact  the  band  model  is  a  rather  nice 
special  case.  If  we  let  m^A)  =j^f  ^(9)dX (9)  and  m^(A)  =j^  f^(9)dX(9) 
for  allA€£7where  and  are  the  upper  and  lower  PSD's  of  a  band 
model  as  in  (3.8)  then  the  corresponding  band  model  of  measures  is  given  by 


=  (m€/Sj  mL(A)  <_  m(A)  _<  mU  (A),Va€<7;  m([-:r,:r])  =  2ttw} 


(3.^9) 
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where  w  is  the  constant  power  and  m  and  m  are  lower  and  upper  measures, 

respectively,  satisfying  m  ( [— tt ,  tt]  )  _<  2irw  _<  m^([  — ir,7r])  .  So  we  see  that 

for  the  band  model  there  is  a  one-to-one  correspondence  between  the  density 

U 

version  and  the  measure  version.  Furthermore,  if  f  (0)  does  not  satisfy 

condition  (ii)  of  Theorem  3.1  we  can  take  the  densities  with  respect  to 

m^  instead  of  X.  Once  we  do  this  we  have  f^(9)  =  1»V  6 »and  the  conditions 

of  Theorem  3.1  will  be  satisfied.  Hence  a  most  robust  filter  exists. 

But  since  77^  is  a  2-alternating  capacity  class  (it  is  generated  by 
L  c  U 

v(A)  =  min  {2irw-m  (A  ) ,  m  (A)},  see  Chapter  IV)  we  have  that  the  conclusions 
of  Theorem  3.3  also  apply  to  the  band  model.  Hence  we  have 
Theorem  3.4.  If  J  and  71  are  band  models  as  in  (3.8),  c,r  equivalently,  as  in 
(3.49),  then  there  exists  a  pair  (f^,  such  that  (f<,L,  ^  ) 

the  optimal  transfer  function  for  (f  \  f^) , 
is  a  most  robust  transfer  function. 

Note  that  a  singleton  is,  of  course,  a  special  case  of  a  band  model. 

The  p-point  model  is,  unfortunately,  not  a  capacity  class  and  is 
never  weak*  compact,  but  in  Chapter  V  we  will  show  that  the  weak*  closure 
of  a  p-point  model  is  a  2-alternating  capacity  class  and  in  many  situations 
the  results  for  the  closure  can  also  be  shown  to  hold  for  the  actual 
p-point  class. 

The  last  of  the  2-alternating  capacity  classes  considered  in  [6] 

(the  first  two  being  the  e-contaminated  and  total  variation  classes)  is 
called  the  Prokhorov  class.  It  has  the  form 

^p=[m?5+j  m(A)<  m°  (A^  )+  Sm°  ( [  -it  ,  jt]  )  ,VA  -  (7;  m  ( [-*r ,  tt  ] )  =m°  (  [-tt  ,tt]  )  }  (3 . 50) 


is  least  favorable  and  , 
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0  5 

where  6  _>  0  and  e  €  [0,1],  mu  is  a  nominal,  and  A  is  the  closed  6- 

neighborhood  of  the  set  A,  i.e.  A  =  {0€[-i rr,7r]|  infj  0-aj  _<  6}.  The  2- 

aeA 

alternating  rapacity  v  which  gives  171^  the  form  (3.47)  is  defined  by 
letting 

v(A)  =  min{m^(A^)  +  £m^([-iT,iT])  ,  m^([-TT,ir])  } 

for  compact  A  ^  41  and,  then,  extending  v  to  Cl  via  (3.44)  and  (3.45). 

While  the  Prokhorov  class  has  no  immediate  intuitive  appeal,  it  has  some 
nice  theoretical  properties;  for  example,  the  set  consisting  of  all 
classes  having  the  form  (3.50)  (i.e.  as  S  varies  over  [0,2ir]  and  c  varies 
over  [0,1])  forms  a  base  for  (i.e.  generates)  the  weak*  topology  (see  [9]). 

From  the  above  discussions  we  see  that  many  of  the  classes  one  might 
use  to  model  spectral  uncertainty  are  2-altemating  capacity  classes  and 
hence  are  weak*  compact  and  convex.  Thus  we  see  that  the  hypothesis  of 
Theorem  3.3  is  quite  general. 

7 .  Discussion 

In  this  chapter,  we  have  considered  the  problems  of  robust  causal 
smoothing,  filtering  and  prediction  of  a  discrete-time  signal  in  noise  and 
the  special  case  of  robust  noiseless  one-step  prediction.  Our  formulation 
is  analogous  to  that  developed  in  [2]  for  r  -bust  noncausal  continuous-time 
filtering;  however,  the  proof  of  the  main  theorem  (Theorem  3.2)  required  a 
more  abstract  approach  since  no  completely  general  expressions  exist  for 
the  optimal  transfer  function  or  the  minimum  error.  This  same  difficulty 
has  prevented  us  from  proving  a  general  theorem  stating  that  if  the 
uncertainty  classes  of  Theorem  3.2  each  satisfy  power  constraints 


then  a  least  favorable  pair  of  PSD's  can  be  found  directly  from  the  (in 
many  cases,  known)  solutions  to  an  analogous  robust  hypothesis  testing 
problem.  Fortunately,  as  we  showed  for  the  robust  one-step  noiseless 
prediction  problem  in  Section  4  and  for  the  filtering  in  white  noise 
problem  in  Section  5,  Proposition  3.1  can  be  applied  to  yield  this  result 
in  many  cases.  For  example,  it  is  straightforward  to  see  from  the 
expression  given  in  Theorem  1  of  [3]  that  this  approach  will  work  for  the 
problem  of  robust  one-step  prediction  in  white  noise.  Many  other  cases 
could  be  handled  in  this  manner  or  by,  first,  proving  a  more  general 
version  of  Proposition  3.1  to  suit  the  other  cases  for  which  minimum 
error- expressions  have  been  developed  [3],  [ 15 ] — [ 19 ] ,  [33],  [34]. 

In  Section  6  we  saw  that  the  notion  of  a  2-alternating  capacity  is 
useful  as  a  general  model  of  uncertainty.  In  the  next  two  chapters  we 
consider  capacities  and  some  of  their  properties  in  detai1  . 


I 

) 

4 
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IV.  A  GENERALIZATION  OF  THE  HUBER-STRASSEN  DERIVATIVE 

1.  Introduction 

As  we  discussed  in  Chapter  III,  Section  6,  the  formulation  of 
uncertainty  in  terms  of  classes  of  measures  dominated  by  2-alternating 
Choquet  capacities,  first  considered  by  Huber  and  Strassen  [6],  is  quite 
general.  It  includes  most  classes  commonly  used  to  model  spectral  uncer¬ 
tainty  or  to  model  uncertainty  in  robust  hypothesis  testing.  In  [6], 

Huber  and  Strassen  develop  the  Neyman-Pearson  Lemma  for  classes  of 
probability  measures  whose  upper  probabilities  are  2-alternating  capa¬ 
cities.  In  particular,  they  prove  the  existence  of  a  minimax  test 
statistic  between  two  such  classes  (this  statistic  is  actually  a  deriva¬ 
tive  between  the  capacities  which  dominate  these  classes)  and  the  exist¬ 
ence  of  a  least  favorable  pair  (Qq,Q^)  such  that  for  each  fixed  sample 
size  the  Neyman-Pearson  tests  between  Qq  and  constitute  a  minimal 
essentially  complete  class  of  minimax  tests  between  these  two  classes. 

In  addition  to  the  obvious  importance  of  these  results  in  unifying  the 
theory  of  robust  statistics,  they  have  been  used  to  obtain  several  general 
results  in  robust  statistical  communication  theory  [7],  [32]. 

The  one  shortcoming  of  Huber  and  Strassen' s  fundamental  paper  is  that 
the  capacities  which  generate  the  three  classes  of  probability  measures 
most  commonly  used  to  model  uncertainty  in  robust  hypothesis  testing  (i.e., 
those  generating  the  e-contaminated ,  total-variation  and  Prokhorov  neighbor¬ 
hoods)  must  be  restricted  to  a  compact  space  in  order  to  satisfy  property 
(2.4)  in  the  definition  of  a  capacity  given  by  Huber  and  Strassen  [  b  ] . 
Property  (2.4)  insists  that  a  capacity  be  continuous  on  decreasing  sequences 
of  closed  sets.  In  this  chapter  we  relax  this  restri  -tion  by  only  insisting 
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that  our  set  function  (which,  for  lack  of  a  better  term  we  call  a 
generalized  capacity")  be  continuous  on  decreasing  sequences  of  compact 
sets.  This  minor  alteration  allows  us  to  consider  the  three  aforementioned 
neighborhoods  on  noncompact  spaces. 

For  many  robust  statistical  communication  theory  results  it  is  of 
interest  to  consider  a-finite  as  well  as  finite  measures  (the  generaliza¬ 
tion  of  Huber  and  Strassen's  results  to  capacities  v  satisfying  v(ft)  <  <*> 
rather  than  v(ft)  =  1  being  straightforward).  For  example,  in  [7] 
spectral  uncertainty  for  the  problem  of  robust  linear  smoothing  is  modeled 
via  capacity  classes  of  spectral  distributions  on®n.  This  excludes  the 
possibility  of  "white  noise"  whose  spectral  measure  is  given  by  Lebesque- 
Borel  measure  onHn. 

The  purpose  of  this  chapter  is  to  develop  Huber-Strassen  type  results 
for  a  2-altemating  generalized  capacity  class  versus  a  a-finite  measure. 

In  Section  2  we  give  the  definition  of  a  2-alternating  generalized  capacity 
and  some  preliminaries.  In  Section  3  we  construct  the  Huber-Strassen 
derivative  tt  between  a  2-alternating  generalized  capacity  and  a  a-finite 
measure  m  and  we  prove  that  if  a  least  favorable  distribution  Q  exists  then 
it  =  dQ/dm  a.  e-[  m  ]. 

Our  main  theorem  (Theorem  4.2)  gives  an  easily  verifiable  necessary 
and  sufficient  condition  for  the  main  result  of  Huber  and  Strassen  [6] 
to  hold  for  a  distribution  Q  which  we  construct  from  tt.  Corollary  4.1  states 
that  this  condition  always  holds  if  the  generalized  capacity  is  actually 
a  capacity.  Thus,  for  the  problem  of  a  capacity  versus  a  a-finite  measure, 
we  always  have  a  least  favorable  distribution  and  a  Huber-Strassen  deri vatic 
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Section  5  contains  some  examples  and  applications.  In  particular,  we 
give  an  example  of  a  situation  in  which  the  condition  of  the  main  theorem  is 
not  satisfied.  Also,  we  introduce  a  new  2-altemating  capacity  which  gener¬ 
ates  a  widely  used  uncertainty  class  known  as  the  band  model.  This  class 
is  an  accurate  model  of  uncertainty  for  many  applications.  Furthermore,  the 
upper  measure  of  such  a  class  is  a  capacity  even  if  the  sample  space  is  not 
compact. 

2 .  Generalized  Choquet  Capacities 

Let  ft  be  a  complete  separable  metrizable  space,  Cl  its  Borel  o-algebra 
and  1f[  the  set  of  all  nonnegative  finite  Borel  measures  on  ft. 

Definition  4.1:  A  set  function  v  on  (ft,^)  is  a  2-alternating  generalized 
(Choquet)  capacity  if  v  satisfies 


v(4>)  = 

0,  v (ft) 

<  CO, 

(4.1) 

A  C  B  =» 

v  | 

/-N 

< 

> 

V(B), 

(4.2) 

A  t  A 
n 

-  v(An) 

\  v  (A)  , 

(4.3) 

F  4-  F, 
n 

F 

n 

compact  v(F  )  ^  V(F) , 

(4.4) 

v(AUB) 

+  v(AG 

B)  £  v (A)  +  v(B) . 

(4.5) 

Note  that  this  definition  "generalizes"  the  definition  of  a  2-alternating 

capacity  [  6  ]  by  changing  the  condition  "F  closed"  to  "F  compact"  in 

n  n 

property  (4.4) 

For  any  2-alternating  generalized  capacity  v  we  define 
=  {P€ffj;P(A)  1  v(A)  ,  VA  €  Cl  ;  P('.)  =  v(ft);. 


v 


(  4 
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It  was  shown  by  Huber  and  Strassen  [6]  (Examples  3-5)  that  three  of  the 
most  important  uncertainty  classes  in  robust  statistics  (the  e-contamination, 
total  variation  and  Prokhorov  classes)  have  the  form  (4.6)  where  v  is  a 
2-altemating  capacity  only  if  ft  is  compact  (see  Chapter  III) .  The  moti¬ 
vation  for  the  generalization  in  Definition  4.1  is  that  these'  three  set 
functions  are  2-alternating  generalized  capacities  even  if  ft  is  not  compact. 
Moreover,  the  "special  capacities"  considered  by  Rieder  (20]  are  2- 
alternating  generalized  capacities. 

For  all  the  results  of  this  chapter  we  assume  that  ft  is  a-compact. 

The  following  extends  Lemma  2.5  of  [6]  to  generalized  capacities  on 
a-compact  spaces. 

Lemma  4.1:  Let  v  be  a  2-alternating  generalized  capacity  on  (ft,£7).  For 
each  A  G  (2,  there  is  a  Q  &7/{  ^  such  that  Q(A)  *  v(A) . 

Proof :  Since  ft  is  a-compact,  let  { }  be  a  sequence  of  compact  sets  such 

that  K  +  ft.  We  denote  the  restriction  of  v  to  K  by  vn;  i.e.,  vn(A)  = 
n  n 

v(A'^K  ),  VA ;  Vn.  Clearlv,  for  each  n,  v11  is  a  2-alternating  capa- 
n 

n 

city.  Thus,  we  may  apply  Lemma  2.5  of  (6]  to  v  . 

Let  A€^7.  Denote  AfhK  by  An.  From  Lemma  2.5  of 

n 

[6],  there  exists  €  7j\  1  such  that  Q^(A^)  =  v^(A^)  and,  for  each 
n  ^  2,  there  exists  vn  such  that  Qn(An\An  ^)  =  vn(An\An  ^)  .  For  all 

,  define 


Q(B)  =  Q  (B)  +  l 


v(A,n_)-y-(.An~L),  |  q  (Bn  Ar\  An_1)  + 
n=2  (-  Qn(An\An-1)  J  n 


v(K  )-v(An)+v(An_1)-v(K  .)' 

n  n-i 


QJK  \(AnUK  )] 
n  n  n- 1 


Qn[B-.Kn\(A  UVl)] 


M 


Since  ft  is  assumed  to  be  a-compact,  we  can  fix  a  sequence  of  compact 

sets  K  +  ft.  For  a  2-alternating  generalized  capacity  v  and  a  cr-finite 
n 

nonnegative  measure  m  on  (ft,  67  )  we  denote  by  v11  and  m11  the  restrictions 
of  v  and  m,  respectively,  to  the  set  K^.  Clearly,  for  all  n,  vn  is  a 
2-altemating  capacity  and  mn  is  a  finite  nonnegative  measure  on  (ft, 67  ) 
(hence,  mn  is  also  a  2-alternating  capacity). 

Thus,  for  each  n,  the  theory  of  Huber  and  Strassen  [6]  can  be 
applied  to  the  pair  (vn,mn).  In  particular,  for  each  n  and  for  each 
t€[0,°°]  »  there  exists  a  set  A^  minimizing,  over  all  A€  Q  ,  the  set 
function  w£(A)  =  t  mn(A)  +  vn(Ac).  Further,  for  each  n,  there  is  a 
function  ir11  (the  Huber-Strassen  derivative  of  v11  with  respect  to  m11) 
such  that  A^  =  {tt11  >  t}  for  every  tG[0,°°]  .  Finally,  there  is  a  least 
favorable  distribution  ^n,  i.e.  Qn({7rn  £  t})  =vn({Trn£t})>_ 


P  ( {  tt11  £  t})  for  all  P  G  57!  yn  and  for  every  tS[0,»] 


Regarding  the  above  situation  we  have  the  following  lemma. 

Lemma  4.2:  The  collection  of  sets  {A^  [  t  G  [0,°°]  ,  n  =  1,2,...}  may  be  chosen 
so  that  £  A^  for  each  t€[0,®]  and  n  >_  1.  Hence,  for  all  n,  nn+^(x)  £ 

irn(x)  for  all  x S  51. 

Proof :  We  first  note  that,  for  any  n  and  t,  w^(A^Uk^)  =  wt^At^’  SO  We 

mav  assume  that  RC  C  An.  Now  fix  n  and  t;  we  will  show  that  w"(A^Ua") 
n  —  t  t  t  t 

-  wt^t^  ^•e*  t'iat 


(4.7 


O  / 


tm[(A"+1UAj)  n  Knl  +  v[(aJ+1uAJ)c]  £  t^ny  +  v(a"C). 

By  adding  tm(Kn+^\Kn)  to  both  sides  of  (4.7)  and  using  the  additivity  of  m 

c  n 

and  the  fact  that  Kn  £  At  we  obtain  the  following  equivalent  expression 

c 

tm[(A*+1  U  A“)  n  Kn+1]  +  v[(A*+1  U  A^)C]  <t  m(A*  n  K^)  +  v(a“  ). 

(4.8 

We  prove  (4.8)(and  hence  the  lemma)  by  first  noting  that  the  right  hand 
side  of  (4.8)is  w”+^(A^),  then  showing  (using  (4.5 )and  some  straightforward 
manipulations)  that  the  left  hand  side  is  less  than  or  equal  to  w^+*(A^) 
plus  w^+^(A^+^)  -  w^+1(A^+^  n  a")  and  finally  noting  that  this  additional 
term  is  nonpositive. 

Lemma4.2  allows  us  to  define  tt(x)  *  lim  iTn(x)  for  each  x  £  ft. 

n-v« 

Further,  it  implies  that  the  definition  of  tt(x)  does  not  depend  on  the 

choice  of  the  sequence  {K  },  since  for  any  alternative  sequence  {K!}  the 

n 

ordered  (by  set  inclusion)  union  of  the  two  sequences  must  produce  the 
same  limit  as  each  of  the  original  two. 

The  following  theorem  justifies  our  defining  it  as  the  Huber-Strassen 
derivative  of  v  with  respect  to  m. 

Theorem  4.1:  Let  v  be  a  2-altemating  generalized  capacity  and  m  a 
a-finite  nonnegative  measure  on  ),  and  let  tt  be  defined  as  above. 

If  there  exists  a  measure  Q  <=/??  such  that  Q  is  least  favorable  for  ^  v 
versus  m,  i.e.  V  t  £  [0,™] 


Q(( 


< 

dm  — 


t» 


-  v({^  < 

am  — 


t}) 


(4.9 


then 


d£ 

dm 


a.e.  [ml  . 


Proof:  For  every  t  €  (0,»],  there  exists  tfc  t  t.  Since  tfc  t  t  implies 
{dQ/dm  <_  t^}  +  {dQ/dm  <  t},  (4.9)  implies  that  Vt  €  (0,«=] 

<  t»'  (4.1 


Also  t^  t  t  implies  that,  Vn*  {irn  £  t^}  i  { tt11  <  t}.  This  fact,  property 
(4.3)  and  the  definition  of  =  { rrn  >  t}  imply  that,  Vt  €  (0,°°]  and  Vn, 


tmn({irn>_t})  +vn({7rn  <  t})  <_  tnn({£  _>  t})  +  vn({^  <  t})  .  (4.11) 

Since  the  right  hand  side  of  (4.11)  is  clearly  less  than  or  equal  to 
t  m({ dQ/dm  t})  +  v({dQ/dm  <  t})  and  n  -*■  °°  implies  {irn  <  t}  +  {tt  <  t},  (4.11) 
and  (4.3)  imply 


t  m({Tr  ^  t})  +  v({tt  <  t})  £t  m({-j^  >  t})  +  v({^  <  t})  . 


(4.12) 


By  (4.10)  the  right  hand  side  of  (4.12)  is  equal  to  txm({dO/dm  >_  t})  + 
Q({dQ/dm  <  t}).  Since  Q  e^v>  the  lefthand  side  of  (4.12)  is  greater  than 
or  equal  to  tm({ir  >_  t})  +  Q({tt  <  t}).  Thus  we  have,  V t  €  (0,°°] 


tm({r  _>  t>)  +  Q ( { tt  <  t})  t})  +  Q({^  <  t}).  (4.13) 

Since,  for  any  t  £  [0,°°),  t^  +  t  implies  {tt  >_  t^}  +  {tt  >  t}  and 
{dQ/dm  t^}  t  {dQ/dm  >  t},  we  have  from  (4.13)  and  the  continuity  of 
measures  that 

t  ra(  {  tt  >  t})  +  Q  ( {  tt  _<  t})  >  t})  +  Q  ( {^m  -  c  ^  ’  (4.14) 

Vt  £  [0,=°].  By  the  uniqueness  of  the  Radon-Xikodym  derivative  between  two 
measures  (see,  for  example,  Royden  [38]  )  we  have  that  tt  =  dQ/dm  a.e.  [m] 
(cf.  the  remarks  near  the  end  of  Section  3  in  [b]).  QED. 
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EG 


4.  The  Least  Favorable  Distribution 

By  Fatou's  Lemma  (see,  for  example,  Royden  [38],  Proposition  17, 
p.  231)  we  have,  for  each  A  £  £7,  that 


irdm  lim  inf 
n  -*■» 


f  n,  n  ^ 
it  dm  £ 


lim  inf  Qn(A) , 
n  +  » 


(4.15) 


where  n,  irn,  and  Qn  are  defined  at  the  beginning  of  Section  3.  By  Lemma 
4.1  there  is  a  distribution  Q  £  7!\  such  that  Q ({ it  <  «})  **  v({ir  <  °°}).  We 
define  the  distribution  Q  on  (0,<7)  by 


Q(A)  = 


irdm  +  Q(A  O  { rr  *  <*>}), 


(4.16) 


for  each  A  If  there  exists  a  least  favorable  distribution  then  Theorem 

4.1  and  the  Fundamental  Theorem  of  Calculus  imply  that  the  Q  given  in  (4. 16)  is 
also  least  favorable.  Furthermore,  we  have 

Theorem  4.2:  Assume  v  is  a  2-alternating  generalized  capacity  and  m  is  a 
a-finite  nonnegative  measure  on  (fi,£7  ).  Let  it  be  defined  as  in  Section  3 
and  let  Q  be  defined  by  (4.16).  If  Q((tt  <  ”})  =  v({tt  <  »})  (or  equivalently 
if  Q(Q)  =  v(ft))  then  Q  is  a  least  favorable  distribution  for  771  versus  m; 
i.e.,  for  every  t  £  [0 , 00  ] 


Q  (  {  TT  <_  t})  =  V  (  {  7T  <_  t}) 


(4.17) 


and  it  is  a  version  of  dQ/dm. 

Note  that  the  hypothesis  is  also  trivially  necessary.  Also,  note  that 

if  m(Q)  <  00  and  the  conclusion  of  Theorem  4.2  holds  then,  by  [  20, 

Proposition  2.1  |,  Q  is  least  favorable  for  any  minimax  testing  problem  for 

771  versus  {m}. 
v 
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Proof:  From  the  definition  of  Qn,  the  fact  that  t^  t  «  implies  {it  £  t^}  + 
{it  <  <*>},  and  property  (4.3)  we  have  that  Qn ({ tt r  '<“})=  Vn({TTn  <  =})  + 
v( { tt  <  “>}).  Thus,  the  hypothesis  of  the  theorem  implies  that  Qn({-rrn  <  °°}) 
t  Q ( { tt  <  “>})  ;  i.e. 


lim 

n-*=° 


n  ,  n 
tt  dm 


For  any  set  A  6^7,  we  apply  the  Generalized  Dominated  Convergence  Theorem 


(Royden  [38],  Proposition  18,  p.  232)  to  the  sequence  { -rrn |  |n=l,2,...} 

A 


(tt11  |  is  the  restriction  of  nn  to  A)  to  obtain  lim 

A 

n-*» 


n  ,  n 
it  dm  = 


it  dm;  i.e.  , 


lim  Qn(A)  =  Q(A)  ,  VA  €  .  (4.18) 

00 

If  we  set  A  »  {u  <  t}  m  O  {ir^  <  t)  in  (4.18),  we  have  Q({^  <  t})  = 

°°  u  k=l 

lim  Qn(  U  {ir  <  t})  >_  lim  Qn({irn  <  t})  =  lim  vn({Trn  <  t})  »  v({tt  <  t}). 
n-wo  k=l  n-*=°  n-*®° 

Thus,  we  have  Q({tt  <  t})  ^  P({tt  <  t}),  for  all  P  e77iv  and  all  t  E  [0,°°]. 
From  property  (4.3)  we  have  Q({~  <_  t})  _>  P({t  <_  t}),  for  all  P€  27?  ^  and  all 
t  £  [0 , 00  ] ,  since  t^  +  t  implies  {tt  <  t^}  +  {  tt  <_  t  > .  But,  by  Lemma  4.1, 
for  each  t  €  [0 , 00  ]  there  is  some  £ 7J\v  such  that  P t ( { tt  <_  t})  =  v({tt  <_  t}) 

so  we  must  have  Q ( { tt  <  t})  =  v({tt  t}),  Vt  £  [0,“]  .  QED. 


Corollary  4.1:  If  v  is  a  2-alternating  capacity  and  m,  tt  and  Q  are  as  above 
then  Q ( { tt  <  °°})  =  v({tt  <  <*>});  hence,  the  conclusions  of  Theorem  4.2  hold. 


Proof:  We  wish  to  show  that 


n  n 
tt  dm 


ndm.  Let  ?fn  =  tt11  on  K  and  0  on 

n 


K  ;  we  will  show 
n 


~n  , 
tt  dm 


Trdm.  By  the  Vitali  convergence  theorem  (see 


Dunford  and  Schwartz  [101,  pp.  150,  173)  this  happens  if  for  any  sequence 
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’ 

{E^}  C  such  that  E^  +  <£  we  have  lim  ftndm  =  0  uniformly  in  n.  To  prove 

this  we  use  the  fact  that  a  capacity  class  is  tight  (Huber  and  Strassen 

' 

[  6  ]  ,  Lemma  2.2)  to  pick  (given  e  >  0)  such  that  ftndm  Qn(K^)  < 

^  c 
KT 

e/2  for  all  n.  Then  since  {ftn |n=N,N+l, . . . }  is  decreasing  on  we  have  that 
f  n  N  N 

ft  dm  ft  dm  <_  Q  (E^  n  K^)  .  Now  for  each  n=l,...,N  there  is  an 

EknKN  EknKN 

L  such  that,  Vk  >  L  ,  Qn(E,  )  <  e/2.  Now  let  L  =  max{L.  ,  ...,L„  }, 

n,e  k.  e 

we  have  that,  for  all  k  >_  I.  ,  ftndm  ftndm  +  ftndm  <  e/2  +  e/2,  for 

n 


every  n. 


\  *N  EkOKN 


5.  Examples  and  Applications 


Our  first  example  shows  a  situation  in  which  the  hypothesis  of  Theorem 
4.2  fails  to  hold. 

Let  a  =  R  and  let  v  be  the  2-alternating  generalized  capacity  which 
setwise  dominates  an  e-contaminated  neighborhood  of  probability  distribu¬ 
tions,  i.e.,  let  e  G  (0,1)  and  let  v(A)  =  (l-e)P(A)  +  e,  for  A  #  <t>,  where 
P  is  a  probability  measure  on  (r,£7).  Also,  let  m  be  Lebesgue-Borel 
measure  on  (R,<27).  Assume  that  P  has  a  density  p  with  respect  to  m. 

Choose  K  +  R  with  K  compact  Vn.  From  [14],  Section  3,  we  have 
n  n 

that  on(x)  =  max{cn, (l-e)p(x) } ,  Yx  €  Kn>  where  cn  _>  0  can  be  chosen  so  that 

^  Tf°dm  =  v(K  ).  It  is  straightforward  to  show  that  lira  c  =0.  Thus, 
i  n  n 

K 

n 

tt(x)  =  (l-c)p(x),  Yx  GR  ,  and  we  have  that  {it  <  "rr  =  2  and  0({tt  <  »})  = 

t 

!  ’idm  =  1-e  <  v  ({it  <  «}). 

R  n 

Intuitively  each  least  favorable  0  tries  to  be  as  much  as  possible 

like  Lebesgue-Borel  measure  by  flattening  the  tails  of  tt11  with  the  e  of 
contamination;  but,  because  R  is  not  compact,  this  t  of  contamination  is 


allowed  to  slip  off  the  ends  of  the  real  line  as  n 
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On  the  other  hand,  as  we  mentioned  earlier,  all  the  neighborhoods  of 
probability  distributions  commonly  used  to  model  inexact  knowledge  in 
robust  hypothesis  testing  are  2-alternating  generalized  capacities  even 
if  the  sample  space  ft  is  not  compact.  Since  least  favorable  pairs  have 
been  found  for  all  these  classes  [14],  [27],  [20],  Theorem  4.1  implies  thatj 
for  any  of  these  specific  but  important  examples,  we  have  succeeded  in 
extending  the  main  result  of  Huber  and  Strassen  [6]  to  a  generalized 
capacity  class  versus  a  finite  measure  on  a  a-compact  space.  Further, 
Corollary  4.1  implies  that  this  is  true  for  any  2-alternating  capacity 
class  versus  any  a-finite  measure  on  a  a-compact  space.  To  illustrate 
this  result  we  introduce  the  following  example  of  a  2-altemating  capa¬ 
city  on  a  noncompact  space. 

Let  m^  and  m^  be  finite  nonnegative  measures  on  (ft,C7)  such  that 

mL(A)  ^m^A),  VA  €<7  .  (4.19) 

Define 

v(A)  =  minCw-m^CA0) ,m^(A) } ,  (4.20) 

for  A  €  £7  ,  where  w  is  a  positive  constant  such  that  m^(ft)  £  w  <_  m^(ft).  For 
this  v,  T7[  is  given  by 

{m  €^|mL(A)  <_  m(A)  <_  m^(A),  VA  €  a  \  m(ft)  =  w}.  (4.21) 

This  class  is  called  the  band  model.  Note  that  every  element  of  7!\  is 
absolutely  continuous  with  respect  to  m^, ,  so  if  w=l,^J  ^  is  equivalent  to 
the  class  of  pdf’s  considered  by  Kassam  [21] .  Most  importantly,  least 
favorable  pairs  are  given  for  these  classes  by  Kassam  [21). 

Classes  of  the  form  <4.21)  arise  naturally  as  a  confidence  hand  around 
an  estimate  of  a  pdf  (where,  of  course,  w=l)  or  around  an  estimate  of  a 
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spectral  density  (where  w/2ir  is  the  known  power  of  the  process).  Also, 
as  has  been  noted  by  Kassam  [21]  ,  if  we  define  e  =  l-mL(fi)/w  then  the 
band  model(4 . 21)  contains  those  elements  of  an  e-contaminated  class  which 
are  bounded  above  by  my.  Thus,  the  e-contaminated  class  may  be  thought 
of  as  a  limit  of  classes  which  are  band  models. 

Proposition 4. i:  The  set  function  v  in  (4.20),  which  defines  the  band 
model  (21),  is  a  2-alternating  capacity  even  if  Q  is  not  compact. 

Proof:  The  2-alternating  property  (given  by  (4.5))  is  the  only  defining 
property  which  is  not  straightforward.  We  handle  (4.5)  case  by  case.  If 
A  and  B  are  such  that  v(A)  =  nty(A)  and  v(B)  =  niy(B)  then,  by  (4.20), 
v(A  U  B)  +  v(A  H  B)  <  mu(A  U  B)  +  n^(A  O  B)  =  n^(A)  +  m^(B)  =  v(A)  +  v(B)  . 

Similarly,  if  V(A)  =  w-mL(Ac)  =  my(A)  +  (w-m^fi))  and  v(B)  =  iri^B)  + 

(w-my(R))  then(4.5)holds.  If,  say,  v(A)  =  niy(A)  +  (w-niy(fi))  and  v(B)  = 
my(B)  (A  and  B  are  interchangable)  then  we  must  have  v(A  O  B)  *  my(A  n 

tse  if  niy(A  B)  +  (w- 
^.(B),  we  have  (by  subtraction) 
that  my(B\A)  >  niy(B\A)  which  contradicts(4 . 19) .  (The  possibility  of 
niy(A  U  B)  <  niy(A  U  B)  +  (w-niy(il))  is  similarly  disallowed.)  The  one 
possible  case  is  v(A  U  B)  =  my(A  U  B)  +  (w-rriy(2))  and  v(A  n  B)  =  niy(A  n  ®)  ■ 

In  this  case(^*5)is  equivalent  to  niy(B\A)  <_  my(B\A)  which  always  holds  by 
(4.19).  Thus  v  of  (4.20)  which  gives  rise  to  the  band  model  (4.21)  is  a 

I 

2-alternating  capacity.  QED. 

Theorem  2.2  of  [7]  states  that  if  v^  and  vN,  are  2-alternating 
capacities  on  ]Rn  and  a  is  a  version  of  the  Huber-Strassen  derivative 

i 

J. 

between  v^  and  vy  then  h  -  n- /  ( 1 H — r )  is  a  minima:-:  linear  smoother  for 

* 


and  v(A  U  B)  =  my(A  u  B)  +  (w-niy(Q))  beca 
niy (A  n  B)  then,  since  m^B)  +  (w-m^Gl))  _> 
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m  and  71\  ,  where  m  and  T7[  are  uncertainty  classes  of  signal  and 

VS  VN  VS  VN 

noise  spectral  measures,  respectively.  A  careful  examination  of  the  proof 

of  this  result  shows  that  it  holds  for  any  two  set  functions  v„  and  v  for 

u  N 

which  the  conclusions  of  Theorem  4.1  of  Huber  and  Strassen  [6]  hold. 

Thus  we  see  from  Corollary4.1  that  Theorem  2.2  of  [7]  holds  for  Vg 
and  m^  where  is  a  2-alternating  capacity  and  m^  is  a  nonnegative 
a-finite  Borel  measure  on  Rn.  Unquestionably,  the  most  important  examples 


to  which  this  extension  can  be  applied  are  those  where  m^  is  Lesbesgue-Borel 
measure  on  Hn,  which  is  the  spectral  measure  of  continuous-parameter  white 
noise. 
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V.  ON  THE  P-POINT  UNCERTAINTY  CLASS 

1.  Introduction 

As  we  have  discussed  in  the  preceding  chapters,  there  are  many 
applications  of  statistical  communication  theory  for  which  it  is  inappro¬ 
priate  to  assume  that  we  have  exact  knowledge  of  some  underlying  spectral 
distribution  or  probability  distribution.  A  common  approach  to  such 
situations  (and  the  one  we  have  used  in  this  thesis)  involves  choosing 
classes  of  distributions  which  accurately  model  this  uncertainty.  We  have 
also  discussed  the  fact  that  a  number  of  results  have  been  obtained  for 
situations  where  uncertainty  is  modeled  via  classes  whose  upper  measures 
are  2-alternating  Choquet  capacities  (see  [6],  [7],  [32]  and  Section  6  of 
Chapter  III) . 

The  significance  of  these  results  is  twofold.  First,  in  each  case, 
general  existence  results  are  given  which  unify  the  theory  involved 
(especially  considering  the  generality  of  modeling  uncertainty  via 
2-alternating  capacity  classes;  a  topic  we  will  discuss  below).  Second, 
least  favorable  pairs  for  the  robust  hypothesis  testing  problem  [6]  have 
already  been  found  for  each  of  the  classes  which  have  been  shown  to  be 
2-alternating  capacity  classes  and  the  results  given  in  [32]  and  [7] 
allow  us  to  solve  the  problems  considered  therein  directly  from  such 
least  favorable  pairs. 

Among  the  classes  which  have  been  used  to  model  uncertainty  (and  for 
which  least  favorable  pairs  have  been  found)  are  the  e-contarainated,  total- 
variation,  Prokhorov  and  band  models.  The  first  three  are  shown  to  be 
2-alternating  capacity  classes  (when  restricted  to  have  compact  support) 


in  [6].  The  band  model  was  shown  to  be  a  capacity  class  in  Chapter  IV. 
Furthermore,  least  favorable  pairs  are  known  for  these  classes  (see  [14] 
and  [27]  and  [21]  for  the  band  model). 


One  class  which  has  been  used  to  model  uncertainty  in  detection 
problems  [5],  [8],  rate-distortion  problems  [41]  and  robust  smoothing 
problems  [25]  which  is  not  a  2-altemating  capacity  class  is  given  by 


Ppp  »  (P  efl  |P(A.)  =  w.,  j=l . n} 


(-(!> 


where  77!  is  the  set  of  all  finite  nonnegative  Borel  measures  on  a  compact 


Polish  space  ft,  the  w^'s  are  positive  real  constants,  and  (A^ | j=l, . . . ,n} 


is  a  fixed  partition  of  ft  such  that  each  A^  is  a  Borel  set  with  nonempty 


interior.  If  ft  is  a  compact  subset  of  the  real  line,  if  each  of  the  A^'s 


is  the  union  of  a  symmetric  pair  of  intervals  and  if  the  P's  are  spectral 


distributions  then  P  is  essentially  the  form  of  the  class  referred  to 


by  Sakrison  [41]  as  "class  b"  and  by  Cimini  and  Kassam  [25]  as  a  "p-point 
class."  The  term  p-point  class  is  also  used  by  El-Sawy  and  VandeLinde 


[5],  [8]  for  P  when  the  P's  are  probability  distributions  and,  of 
n  ™ 


course,  E  w.  =  1.  The  class  given  in  (5.1)  is  an  appropriate  model  of 

j=l  3 

spectral  uncertainty  when,  for  example,  we  are  able  to  use  power  measure¬ 
ments  from  a  bank  of  low-pass  filters  [41],  and  they  are  appropriate 
models  for  probabilistic  uncertainty  since  P([-a,aj)  "is  one  of  the  most 
easily  measured  parameters  of  a  distribution"  [5,  p.  725]. 

In  this  chapter  we  cons ider  uncertaintv  classes  of  the  form  P 

PP 

(which  we  henceforth  refer  to  as  p-point  classes).  In  particular,  we 
show  how,  in  many  cases,  the  results  of  [6],  [7],  [32  1  can  be  applied  to 
these  p-point  classes  by  embedding  them  in  slightly  larger  2-alternating 
capacity  classes  which  we  term  extended  p-point  classes  (see  below"). 


2.  Development 


Unfortunately,  the  p-point  class  Pp^  is  not  weak  compact  (see 

k 

Chapter  III  for  a  definition  of  the  weak  topology)  and,  as  we  mentioned 

in  Section  1,  the  set  function  v  defined,  for  each  A,  by 

PP 


v  (A)  =  sup  P(A)  (5.2) 

pp  pep 

PP 

is  not  a  capacity.  (For  a  class  P  C  7f[  ,  the  set  function  v  defined  as 
the  setwise  supremum  over  P  S  P  is  called  the  upper  measure  of  P  and 
Huber  and  Strassen  have  shown  [6,  Lemma  2.5]  that  if  v  is  a  2-alternating 
capacity  then  v  is  the  upper  measure  of  the  set  P^  .=  {P  €/7j  |  P (A)  <_  v(A), 
VA,  P(ft)  =  v(0)}.)  Basically,  the  reason  for  this  is  that  P  is  not 

k 

weak  closed.  To  illustrate  this,  suppose  0  =  [0,2],  n  =  2,  A^  =  [0,1], 

A2  =  (1,2]  and  w^  =  w2  =  %  then  for  each  k  j>  1,  the  probability  distribution 
P^  defined  by  P^({1})  =  %  and  Pk({l  +  l/k})  =  %  is  an  element  of  Ppp*  But 

k 

{P^}  converges  to  weak  where  Pq({1})  =  1.  Clearly  Pq  £  P  . 

We  now  consider  a  new  uncertainty  class  which  we  term  the  extended 
p-point  class.  It  is  given  by 


Ppp-  !P^ 


P (A°)  j<  w,,  P(A.)  ^.w.,  j=l,...,n,  P(ft)  =  £  w.}  (5.3) 

j  J  J  J  3 


where  A.  is  the  (nonempty)  interior  of  A.  and  A.  is  the  closure.  We  use 
3  3  3 


the  notation 


PP 


because  this  class  is  actually  the  weak  closure  of  p 


The  upper  measure  of  F  is  given  bv 

PP 


0 

w , 

3 
n 

_  .  £  w 

VePP(A)  =  i  k=l 


if  a  M; 

if  A  C  A1? ,  j  =  l, .  .  .  ,n; 

if  AC  A?  and  A  <£  A°  for 
k-1, . . . ,n,  j  1 , . . . ,n , 


k^j 


n 

£  w . 


PP 


(5.4) 


otherwise : 


which  is  a  2-altemating  capacity  (recall  that  Q  is  compact  here).  Thus,  the 
importance  of  this  extended  p-point  class  is  that  the  results  of  [6],  [7],  [32] 
can  be  applied  to  them.  Furthermore,  as  we  will  show  below,  the  least 
favorable  pairs  for  two  nonintersecting  extended  p-point  classes  are  also 
contained  in  the  corresponding  p-point  classes  and,  hence,  are  also  least 
favorable  for  the  p-point  classes. 

We  now  consider  the  problem  of  a  p-point  class  versus  a  single  distri¬ 
bution.  This  is  relevant  for  tests  between  a  composite  hypothesis  and  a 
simple  alternative  and  robust  smoothing  of  an  uncertain  signal  in  noise 
(e.g.  band  limited  white  noise).  Moreover,  Poor  [32]  has  shown  that  if 
is  a  2-alternating  capacity  class  of  spectral  distributions  then  the 
spectral  distribution  Q  €  p^  which  is  least  favorable  for  Pv  versus 
Lebesgue-Borel  measure  on  [-tt.tt]  (least  favorable  in  the  sense  discussed 
in  Chapter  IV)  has  rate-distortion  function  equal  to  the  rate-distortion 


function  over  P  . 

v 

Let  P  be  an  extended  p-point  class  as  in(5.3)and  let  Pq  €  %  be 
such  that  P _  £  P  .  For  ease  of  exposition  we  assume  that  SI  is  a  com- 

0  pp 

pact  subset  of  B.n  and  that  Pq  has  a  density  Pq  with  respect  to  Lebesgue 

measure.  It  is  fairly  straightforward  to  see  from  the  definition  of 

dv,/dvA  given  in  [6]  that  dv  /dPA  (where  v  has  the  form  (5.4)  and  is 

the  capacity  dominating  Ppp)  must  be  constant  a.e.[PQ]  on  each  of  the 

A.'s.  Thus,  a  least  favorable  Q  €  p  can  be  given  in  tern  of  its 
j  PP 

density  with  respect  to  Lebesgue  measure: 


q(x)  =  p-^-y PQ(x)  V  x  e  A.,  j=l,...,n 


(5.5) 


Thus  we  have  that 
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dQ  dQ. 


(5.6) 


for  all  P  G  P  .  Hence, (5.6)  holds  for  all  P  €  P  (where  P  is  given 
PP  PP  PP 

-in  (5.1))  and,  since  Q  €  P  ,  we  have  that  Q  (with  density  given  in  (5.5)  is 
least  favorable  for  the  p-point  class  P  versus  the  single  measure  Pq. 

Note  that(5.5)  agrees  with  the  fora  given  by  Sakrison  [41,  equation  (37)  ] 
for  the  spectrum  achieving  the  maximum  rate  over  his  class  b. 

So  we  have  seen  that  by  considering  the  extended  p-point  class  the 
results  of  [6], [7], [32]  can  be  applied  to  the  usual  p-point  class  versus  a 
single  distribution.  In  many  cases,  the  problem  of  one  p-point  class 
versus  another  can  be  handled  in  a  similar  manner.  We  illustrate  this 
possibility  in  a  simplified  case. 

Let  A  and  B  be  Borel  subsets  of  f2  such  that  A  C  b°.  Let  P^  and  Pq 
be  p-point  classes  as  in  (5.1)  with  n=2  and  based  on  A  and  B,  respectively. 
That  is ,  let 

PL  =  (p  en  | P (A)  =  wj,  P(A°)  =  wjl 


PQ  =  (P  em  | P (B)  =  wj,  P(BC)  =  w °2'i 


(5.7) 


We  can  further  assume  that  wj  >  w^  or  that  w*  <  w^,  because  otherwise  we 
would  have  Pq  n  P^  f  <p  and  the  problem  would  be  trivial.  Again,  it  is 
not  difficult  to  show  from  [6],  that  if  v^  and  vQ  are  the  2-alternating 
capacities  determining  the  extended  p-point  classes  corresponding  to  P ^ 
and  Pq  of  (5.7)  then  dv^/dvg  is  constant  on  A  and  on  BC  (note  that  ADBC  =  ?) 
and  that  any  pair  (Qq»Q1)  S  X  ^1  satisfyinS  QqU')  = 

VA'  C  A,  and  Q^(B')  =  Qq(B')  w^/w^,  VB'C  BC,  is  least  favorable  for  P^ 
versus  P„. 
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At  the  beginning  of  this  example  we  assumed  that  A  C  B  .  One  reason 


for  this  is  that  if  there  was  a  point  x  which  was  contained  in  the  boundary 
of  A  and  the  boundary  of  B  and  if,  say,  wj  +  w^  =  w^  +  w^  =  1  then  the 
Q  €  771  which  satisfies  Q({x})  =  Q({fi})  =  1  also  satisfies  Q  S  Pq  n 
where,  for  i=l,2,  is  the  extended  p-point  class  corresponding  to 
In  this  case,  (Q,Q)  €  P  X  P^  is  least  favorable  for  P^  versus  but 
(Q,Q)  ^  P  o  X  ^1‘  Thus  the  approach  used  above  will  not  work  in  this 
case,  and  we  must  note  that  this  case  is  important.  For  example,  for  the 
robust  linear  smoothing  application,  if  we  used  power  measurements  from 
a  bank  of  low  pass  filters  (as  suggested  in  [41])  to  determine  p-point 
classes  to  model  our  uncertainty  about  the  signal  and  noise  spectra  then 
the  boundary  points  would  be  the  same  for  both  classes  and  the  corresponding 
extended  p-point  classes  would  overlap.  Of  course,  this  case  can  be  handled 
more  directly  as  in  [25]. 


3.  Conclusions 


In  this  chapter  we  have  considered  an  uncertainty  class  which  is 
appropriate  for  many  applications.  We  have  shown  that  by  embedding  this 
class  in  a  slightly  larger  2-alternating  capacity  class  the  results  of 
[6],  [7],  [42]  can  be  shown  to  hold  for  the  original  class  in  most  cases. 
Actually  in  those  cases  where  this  approach  cannot  be  utilized  a  more 


direct  approach  can  be  shown  to  work  [58]. 
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VI.  SUMMARY  AND  CONCLUSIONS 

In  this  thesis  several  problems  in  robust  statistical  signal  processing 
have  been  considered.  In  this  final  chapter  we  briefly  summarize  the  results 
obtained  and  propose  some  related  topics  for  possible  further  research. 

In  Chapters  II  and  III  and  the  appendix,  we  presented  a  varied  selec¬ 
tion  of  numerical  results  which  indicate  that  the  robust  Wiener  and  Wiener- 
Kolmogorov  filters,  developed  in  [1],  [2],  [25]  and  in  Chapter  III,  are 
often  preferable  to  the  corresponding  traditional  filters  in  situations 
where  deviations  from  assumed  spectra  might  occur.  In  Chapter  III,  we  also 
gave  a  method  of  obtaining  robust  n-step  predictors  and  robust  n-lag 
smoothers.  Further,  we  illustrated  in  the  case  of  robust  one-step  noise¬ 
less  prediction  how  this  method  could  be  used  to  design  robust  signal 
estimators  utilizing  least-favorable  pairs  from  an  analogous  robust 
hypothesis  testing  problem.  One  possible  topic  for  further  work  is  extend¬ 
ing  this  result  to  cases  other  than  those  cases  considered  in  Sections  4 
and  5  of  Chapter  III;  perhaps,  by  generalizing  Proposition  3.1  and,  if 
needed,  developing  error  expressions  for  those  cases  not  treated  in  [3], 

[15]— [19] •  Another  subject  which  needs  to  be  considered  concerns  the  imple¬ 
mentation  of  these  robust  filters.  We  have,  in  this  study,  made  the  standard 
assumption  that  we  have  knowledge  of  the  infinite  past.  For  most  applications 
this  is  an  unrealistic  assumption;  thus,  an  examination  of  the  effects  of 
finite  memory  on  robust  signal  estimation  would  be  of  interest. 

In  Chapter  IV,  we  introduced  the  notion  of  a  2-alternating  generalized 
capacity  because  several  of  the  most  important  uncertainty  classes  must  be 
restricted  to  compact  spaces  in  order  to  be  capacity  classes,  but  are 
generalized  capacity  classes  even  if  the  space  is  not  compact.  We  then 


developed  a  Huber-Strassen  derivative  between  a  2-alternating  generalized 
capacity  and  a  a-finite  measure  and  defined  a  distribution  which  Theorem 
4.2  guarantees  to  be  least  favorable  for  this  problem  if  any 
least  favorable  distribution  exists.  It  was  also  shown  that,  for  the 
problem  of  a  capacity  class  versus  a  a-finite  measure,  a  least  favorable 
distribution  always  exists.  Finally,  in  Chapters  IV  and  V  two  uncertainty 
classes,  the  band  model  and  the  extended  p-point  model,  were  shown  to  be 
2-alternating  capacity  classes.  The  fact  that  the  band  model  is  a  capa¬ 
city  class  even  if  the  underlying  space  is  not  compact  is  especially 
significant  in  view  of  the  results  of  Chapter  IV  (especially  Corollary  4.1) 
The  significance  of  the  extended  p-point  class  is  that  the  p-point  class, 
which  is  appropriate  for  many  applications,  is  contained  in  it  and,  in 
many  cases,  results  obtained  for  capacity  classes  can  be  applied  to  the 
p-point  class  directly  from  the  corresponding  extended  p-point  class 
results. 

Further  study  regarding  the  topics  of  Chapters  IV  and  V  might  be 
directed  toward  finding  a  Huber-Strassen  derivative  between  two  generalized 
capacities  and  a  corresponding  least  favorable  pair  of  distributions.  Such 
a  result  would  allow  the  full  generality  of  the  Huber-Strassen  theory  to 
be  applied  to  a  larger  class  of  problems. 
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APPENDIX 

The  purpose  of  this  appendix  is  to  give  a  further  selection  from  our 

numerical  study  of  robust  Wiener  filtering  (see  Chapter  II,  Section  3). 

Figures  10-14  give  further  results  regarding  the  examples  presented  in 

Chapter  II.  In  particular,  for  the  p-point  example  of  Chapter  II,  Figures 

10  and  11  give  the  performances  of  the  nominal  filter  and  of  the  robust 
* 

filter  HR  for  wider  noise  bandwidths  c^.  In  Figure  10,  aN  =  100  and,  in 

Figure  11,  =  1000  (recall  that  in  Figures  1  and  4  we  had  ct^  =  10).  Note 

* 

that  as  increases  the  performance  of  at  (°q*vq)  improves  but  at  its 

*  * 
worst  case  s  performance  degradates  further.  Meanwhile  HR's  performance 

changes  little.  Figures  12,  13  and  14  give  further  results  for  the 

e-contaminated  example  of  Chapter  II.  In  Figures  2  and  5,  we  had  aN  =  1000; 

4  6 

in  Figures  12,  13  and  14  we  have  =  100,  =  10  and  =*  10  ,  respectively. 

* 

Again  we  see  that  HR  is  relatively  insensitive  to  changes  in  the  noise 

•k 

bandwidth,  but  is  not. 

Recall  that  the  example  which  we  have  referred  to  here  and  in  Chapter 
II  as  the  e-contarainated  example  involved  robust  filtering  of  an  e-contaminated 
first-order  Markov  signal  in  e-contaminated  first-order  Markov  noise.  In 


Figures  15-22,  we  consider  other  nominal  signal  and  noise  models.  In  parti- 

*  * 

cular,  Figure  15  gives  the  performances  of  Hn  and  H  when  the  nominal  signal 

U  K 

and  noise  power  spectral  densities,  Oq  and  v^,  are  second-order  Markov,  i.e. 


/  3  2 

,  ,  4  as  vs 

G°(W  (  2  a.  V 
(as  +  w  5 


and 


contaminated  example  from  Chapter  II,  s  “.1  ,  a  *  1 
=  10'4,  (from  top  to  bottom)  HQ  at  (jq,'Jq),  HR  at  (a 


it  (j  ,  <j  )  (H_'s  worst  case),  H_ 


at  its  worst  case. 
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Figure  19.  e-contaminated  second-order  Markov  signal  in  s-contaminated 

g 

first-order  Markov  noise,  £  =  .1,  a_  *  1,  a  =  10  ,  (from 

It  «t  ^  *■'*  yf 

top  to  bottom)  Hq  at  (aQ,vQ),  at  (c0,v0),  HR  at  (^L»VL> 

k  k 

(HR  s  worst  case),  at  its  worst  case. 


e-contaminated  second-order  Markov  signal  in  e-contaminated 

3 

first-order  Markov  noise,  e  »  .01,  oc  ■  1,  a,  *  10  ,  (from 
top  to  bottom)  Hq  at  (^q.Vq),  at  (o0,vQ),  at  (cr^, v^) 
(H_'s  worst  case),  H.  at  its  worst  case. 


e-contaminated  second-order  Markov  signal  in  e-contaminated 

£ 

first-order  Markov  noise,  £  »  .01,  ot_  *  1,  a  =  10  ,  (from 

v  * 
top  to  bottom)  Hq  at  (Cq.Vq),  Hr  at  (Oq.Vq),  Hr  at  (<jt ,\>L) 

(H  _'s  worst  case),  H_  at  its  worst  case. 


Figure  22.  s-contaminated  first-order  Markov  signal  in  ^-contaminated 

~  6 

band-limited  white  noise,  £  *  .1,  a  =  1,  *  10  ,  (from 

top  to  bottom)  Hq  at  (o0,vQ),  at  (aQ,vQ)  ,  HR  at  (oL»^ 

(H^’s  worst  case),  at  its  worst  case. 


In  Figures  16-21,  Oq  is  second-order  Harkov  and  Vq  is  first-order  Markov. 

In  all  these  cases  (Figures  15-21)  both  signal  and  noise  uncertainty  are 

modeled  via  e-contaminated  classes.  In  Figures  15-19,  we  have  e  ■  .1 

and  varying,  and  we  see  that  the  results  are  fairly  similar  to  those 

already  presented  when  both  Oq  and  are  first-order  Markov.  In  Figures 

3  6 

20  and  21,  we  have  aN  =  10  and  *  10  as  in  Figures  18  and  19,  respec¬ 
tively;  however  we  have  e  =  .01  in  Figures  20  and  21.  Note  the  surprisingly 
strong  similarity  between  18  and  20  and  between  19  and  21.  Finally,  Figure 
22  is  included  to  substantiate  the  claim  made  in  the  penultimate  paragraph 
of  Chapter  II.  Figure  22  gives  performances  for  the  case  when  is  an 
e-contaminated  first-order  Markov  spectrum  and  Vq  is  e-contaminated  band- 
limited  white  noise.  In  Chapter  II,  we  claimed  that  even  when  the  band¬ 
width  of  the  e-contaminated  bandlimited  white  noise  is  very  large  (in 
Figure  22  it  is  10  )  the  results  for  are  similar  to  the  other  cases  and 

unlike  those  involving  nonbandlimited  white  noise  as  in  Figure  3.  Compare 
Figure  22  with  Figures  14  and  19,  for  example;  they  are  virtually  identical. 
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